Sunday, 20 September 2015

HTTPS Support for a Casablanca Server and Certificate Error 1312


In the previous post about Casablanca C++ REST framework I said that our next task would be adding HTTPS support to our solution. So let's do it.

1. The principle

We start with the server side. On the surface it is rather simple, you just specify the protocol the server is using to be "https://" and voila, everything is done. This is done for example like that:
  _restListener = http_listener(U("https://localhost:8111/test"));
But if we peek into the code, we'll see that server-side HTTPS is only supported for Windows as for now (Casablanca 2.5.0). OK, let's test it! We start the server, let Chrome get some data from the newly created SSL-URL*, and... it won't work, apparently there's no such URL! What? We already started our server there, and it's running.

After a second peek into Casablanca's code, we learn that it's leveraging Windows HTTP Server API, but does not configure any HTTPS support by itself. Then we learn from MSN docs, that this will be usually done in the configuration step, before the HTTP server is started. Apparently a programmatic solution is possible too, bu we start with the simple one (or so we think...)**.

In short, we have to attach a server certificate to a port on the machine where our server runs (i.e. our developer machine, at least in this post). For that, we first have to create a new certificate.

2. First try

For the sake of our tests we want to create a self-signed certificate. On Windows we can use the MakeCert.exe tool for that. So I opened the VisualStudio developer console and typed***:

  makecert.exe -sr LocalMachine -ss Root -a sha512 -n "CN=my_machine_name" -sky exchange -pe -r

as to create a self-signed (-r) cert for local machine (required for the add sslcert step later), export it's private key (-pe), use if for exchange (-sky) and store it straight away into the trusted Root Certification Authorities store. OK, it worked.

Next we open the MMC, check that the certificate is there and it has no errors. Don't know how? It's explained several times on the Web, for example here.

Now we need only to set the certificate for server's IP address. The command for that is :

  netsh http add sslcert ipport=0.0.0.0:8111 certhash=XXXX appid={yyyyy}

For the certhash we can use the thumbprint from certificate's Properties window in MMC, for the application ID the most frequent advice is to use the ID of the IIS Server, but as we don not even have it installed on our machine, we prefer to use an empty GUID: {00000000-0000-0000-0000-000000000000}, it's allowed too! Should be working OK!

Alas the dreaded Error 1312 showed it's ugly head:

  Certificate add failed, Error: 1312 A specified logon session does not exist. It may already have been terminated.

WTF? What is that supposed to mean? There are many reasons, like bad certificate hash (nope, double-checked it), a missing Windows hot-fix (nope, it's 2010, already installed), not exported private key (nope, we did it), some mysterious cases when certificates get renewed (nope, there are fresh here), etc, etc. But definitely not a problem with a logon session, that much is true!

3. The second try

What should we do? As always the advice is: go back to the last known working setup. As to do this, I followed advice in this blogpost quite meticulously, and created two certificates:

  makecert -sk testRootCA -sky signature -sr localmachine -n "CN=RootTrustedCA" -ss TRUST -r RootTrustedCA.cer

  makecert -sk testServer -ss MY -sky exchange -sr localmachine -n "CN=Server" -ic RootTrustedCA.cer -is TRUST Server.cer -pe


The first one is supposed to be the base and trusted one, the second to reference the first. That didn't cut it either, because the RootTrustedCA still wasn't trusted - I had to add it to the Root store using MMC (don't know how? - look here).

This time, I didn't want to type the certificate hash and used this nice tool to apply the certificate to the desired IP address, and guess what - it worked! The last problem was, that the domain and certificate's names didn't match, so I created a cert with my machine's name:

  makecert -sk -ss MY -sky exchange -sr localmachine -n "CN=my_machine_name" -ic RootTrustedCA.cer -is TRUST my_machine_name.cer -pe

 applied it to 0.0.0.0:8111 and, at last, everything worked like a charm:




4. Conclusion

As it seems, add sslcert (an by extension, Windows' HTTP Server API) only supports certificates which reference a base trusted one, and doesn't like the ones from the Root store. Remember: the certificate has to be in the User store and has to be in the computer account (i.e. for the local machine)!

Update: The above conclusion is in principle correct, but only when we the add two small words: "by default"! As explained in a friendly comment, add sslcert accepts a command line switch where the desired certificate store can be specified!

OK, that was a little tiring, we'll add support for the client side in another post.

PS: Later I had to add configuration of the SSL certificates to our server configuration code, lots of Windows  system programming :-/.

--
* I mean TSL, but for historical reasons....

** Of course in the real server we'd will implement programmatic solution for configuration of the SSL options for a given URL. You may look up the HTTP Server API's functions HttpSetServiceConfiguration and HTTPQueryServiceConfiguration

*** not quite, first I tried to create a private certificate referencing an already existing, trusted one with:

  makecert.exe -sr LocalMachine -ss MY -a sha512 -n "CN=my_machine_name" my_machine_name -sky exchange -pe -eka 1.3.6.1.5.5.7.3.1

here the implicit Root Agency certificate is used. The problem here is, that the created certificate was broken, MMC said: "This certificate has an invalid digital signature". After much ado it turned out, that the referenced  Root Agency certificate uses a public key which is too short (only 512 bits!!!) and is thus considered not secure. Thanks for a very informative error message!


Casablanca C++ REST Framework - One Year Later.


Approximately one year ago I started a new project for one of my customers with the aim of adding a REST interface to their redesigned product. The new release should transform their as yet desktop-only and Windows-only application into a cross-platform, distributed, client-server one, using an HTTP API for communication.

The technologies to be used were Qt, C++11, Windows and Apple's OS X. The question was how to implement the REST interface.

The Setup

After having investigating some choices*, I settled for Microsoft's C++ REST SDK code-named Casablanca. It was cross-platform (sic!), used modern C++ constructs (C++11, you've known that I'd like that!) and was open source (sic!). Sounds like it wasn't Microsoft, but I think, we still need some time to get used to the new Microsoft.

There were some problems, though. The client choose Qt 5 framework for portability, and initially I was worried if Casablanca would play well with Qt's "I am the world" attitude, it's message pump and threading model. Moreover, the server implementation resides in an "experimental" namespace, which normally isn't a good sign either!

On the positive's side there was JSON support and nice asynchronous file transfer implementation based on the PPLX tasks (on Windows, for Linux, OS X, etc. Microsoft wrote a port). This was a big one, as the main functionality of the server will be processing files, and the input files will be mostly uploaded from other machines. And of course the biggest one - it's open source!

So the endeavor was not without risk! What can I say after a year give or take a couple of months after we started?

Highlights

One of the highlights is of course the task-based implementation of asynchronous processing used in Casablanca's API, like here, in the already mentioned, built-in support for file transfers:
CCasablancaFileTransfer::Task CCasablancaFileTransfer::StartFileDownload(const QString& downloadUrl, const QString& localFilepath) const
{
  using concurrency::streams::istream;
  using concurrency::streams::streambuf;
  using concurrency::streams::file_buffer;

  web::http::uri url(downloadUrl.ToStdString());
  web::http::client::http_client client(url);

  web::http::http_request getRequest(web::http::methods::GET);
  getRequest.headers().add(web::http::header_names::accept, "application/octet-stream"); 

  return client.request(getRequest)
    .then([=](pplx::task<web::http::http_response> previousTask)
  {
    try
    {
      auto response = previousTask.get();

      if (response.status_code() != web::http::status_codes::OK)
      {
        QString errTxt = ".....";
        return pplx::task_from_result(std::make_pair(false, errTxt));
      }

      try
      { 
        streambuf<uint8_t> localFile = file_buffer<uint8_t>::open(localFilepath.ToStdString()).get();

        return response.body().read_to_end(localFile)
          .then([=](pplx::task<size_t> previousTask)
        {
          streambuf<uint8_t>& nonconstFile = const_cast<streambuf<uint8_t>&>(localFile);
          nonconstFile.close().get();

          // ETag?
          QString maybeEtag = ftutil::FindHeader(response, web::http::header_names::etag);

          return pplx::task_from_result(std::make_pair(true, maybeEtag));
        });
      }
      catch (...)
      {
        return TranslateFileException(localFilepath);
      }
    }
    catch (...)
    {
      return TranslateWebException();
    }   
  });
}
Please notice that each block following a .then()will be executed asynchronously, in a separately scheduled thread (or task), when the preceding step will finish! You can do the same on the client side of course. Alternatively you can force blocking processing of a task by calling its get() method.

If you like comparisons, you may have a look at Facebook's Futures in fbthrift, They are generally working like Casablanca's tasks, but ave an additional nice onError() clause and even the possibility to choose a specific executor!

Note: I won't give here an introduction to Casablanca, the basic usage was explained several times on the Web (look here and here for basic client usage, here for basic server example, and here for file transfers). However, what I found is missing in all intro material I've seen, is the mention of exception propagation between asynchronous tasks. The problem is that a thrown exception has to be "observed" by library user, and if it won't be observed, Casablanca will "fail fast", i.e. take down the server in the destructor of the task that trew. Surprise, surprise, your server is crashing! An exception is observed (i.e. marked as such and then rethrown) if the .get() or wait() methods are called for that task or one of its continuations. So be cautious! The above code thus needs additional try-catch clause around the final .get() call, but I omitted it for sake of simplicity...

So it didn't take long, and I could announce:

Problems

1. The only really big problem that hit us was the performance. The trouble was that after a couple of thousands of file uploads or downloads server performance tumbled into a free fall: the same basic polling which normally had taken 2-3% of the CPU time surged to 20-30% after that! That was a real show stopper at first. And of course it wasn't our code that showed up in the profiler, it was some of Casablanca internals!

It took me about 2 weeks to investigate that, and maybe I'll write a more detailed post about it some time, but for now it suffices to say that Windows Concurrency Runtime (i.e. the native, task-based implementation of PPL) was left with an ever-growing internal list, which was sequentially scanned with each tick of the scheduler - at least in our environment of Windows 7 and Visual Studio 2013 plus an early Casablanca version (1.2.0).

I reported it to Microsoft, and they reacted pretty quickly - the next release has been given a define (CPPREST_FORCE_PPLX) to disable Concurrency Runtime and switch over to Windows Threadpool based implementation of PPL. I got the latest version from the development branch, tested it, and voila! our performance problems vanished into thin air. We then waited for the next release (2.5.0) and when it came out, we upgraded our code to it and suddenly everything worked like a charm. BTW, for the Casablanca version for Visual Studio 2015 Windows Threadpool is the default setting (or so I was told)**.

2. Other problem was lack of built-in CORS support, I had to implement t myself. Wasn't that difficult, but it's bound to our mixed Casablanca/Qt environment, so unfortunately I couldn't contribute it back to the project.

3. Then there were Qt-specific problems, first of them Qt's notorious usage of the /Zc:wchart_t- flag on Windows. This means that Qt uses for wide character not a native type (as the standard would require) but a typedef to unsigned short. Thus you won't be able to link Casablanca to your Qt-compliant project, because std::string will be mangled to $basic_string@GU by Qt (and your code) but to $basic_string@_WU by standard Casablanca build.

The remedy is to build Casablanca on your own (i.e. not to use the NuGet packet) with /Zc:wchart_t-. This will first fail with an error complaining about double definition for wchar_t, but then uncommenting the redundant definition will suffice. Hmm, when I think of this, I should probably contribute this back to Casablanca with some #ifdef combination...

4. Another Qt-related problem I stumbled upon was nothing more than a new type of deadlock (at least for me). I'll dub it the "big signal-mutex confusion deadlock". It's quite interesting though: imagine that your handler does one half of the work in a Casablanca thread but the second one in the main Qt-thread. Then use some lock to protect a resource from parallel access.

Now imagine, you lock the mutex in a Casablanca part of the GET handler and emit a signal to continue processing in Qt context. Now another handler (e.g. DELETE) didn't need to lock in the Casablanca context, already emitted a signal and is now in the Qt context trying to lock the mutex. The problem is, the second part of the GET handler will never execute, as the message pump is blocked with waiting for the mutex, which can be unlocked only by the next signal to come - deadlock. Mind-boggling? Well, threads are hard. Remedy: always lock the resource in Casablanca part of the handler, even if you don't need it.

Admittedly, the problem looks somehow artificial, but that's a consequence of the somehow inane requirement, that the requests should be finally processed in the Qt-context, a requirement originating from another part of the system, which I won't discuss or disparage here.

5. A minor problem was timestamp comparison using Casablanca's utility::datetime class:
int CBasicLastModifiedMap::CompareSec(const utility::datetime& lhs, const utility::datetime& rhs)
{
#if _DEBUG
 // TEST:::
  auto lhsStrg = lhs.to_string();
  auto rhsStrg = rhs.to_string();
#endif

  int timestampDiffSec = lhs - rhs; // truncates up to seconds!

  if (timestampDiffSec == 0)
    return 0;
  else 
    //extra check, because timestampDiffSec always > 0 (Casablanca problem!)
    if (lhs.to_interval() > rhs.to_interval()) 
      return 1;
  else
    return -1;
}
Bug or feature? - decide by yourself.

6. One (minor?) problem we encoundered was specifying all interfaces for the server to bind on. Normally you'd expect to be able to use "0.0.0.0", but in Casablanca rejects it. After consulting the source code the solution was clear: just use "*" instead!***

Conclusion

Otherwise: we are quite happy!!! The system works cross-platform, the performance is good, no apparent problems there.

Well, no problems till now. My client didn't specify any security for the first version of the product, they assumed (quite reasonably) that the customers will use the new product only inside of their trusted network at first. Neither is any notion of client identification or client roles planned, nor are we using gzipping of the HTTP data. Thus the more advanced features of a HTTP server weren't required.

As the next step towards more complicated scenarios, we'll look at HTTPS support in Casablanca. See you than...

Update:

OK, we actually found one weird bug while testing: "http_listener crashes when URI contains a pending square bracket" (bug report + proposed fix here). I had to patch it locally for the time being, but as it seems, it'll be fixed in version 2.6. Weird, other misconstructed URLs are rejected OK, only square brackets generate crashes. 
Update 2: same problem with square brackets in HTTP parameters, only I didn't resolved it yet (no time, a low prio bug). So story continues.
Update 3: The above problem resolved, see my code comments for explanation:
  // As of Casablanca 2.5 http_request::relative_uri() will throw exception if it encounters (correctly) encoded
  // "[", "]" or "#" characters in the query parameters!
  //  - workaround: try to extract the relative path by hand (message.absolute_uri().path() works)

Another 
Update: (07 Sep. 2016)

I said, we were overall happy with Casablanca, but there's a new problem, which probably could be pretty grave. Namely, Casablanca server tends to crash sometimes. In normal operation it's very seldom, but if there is a severe overload, it may happen rather more often :(. At first I thought another parts of code would have some dangling pointers, and put this error into a waiting room.

Recently I took some of my time to analyze it, and it seems to be a genuine Casablanca problem, which is somehow connected to handling of timed-out client connections (as it seems in the moment). In Release 2.8.0 there was a pull request that refactored this part of the code to remove some race conditions, but I'm not sure if this was sufficient to fix the crash...

As soon as I solved that, I'll blog. For the moment take heed, the server part is still in the "experimental" namespace (as of 2.8.0) and:
"The http_listener code is the least tested code in the library despite its age, so there are certainly a lot of issues here that need fixing."
--
* like: QtWebApp (by Stefan Frings), Qxt's WebModule, Tufao, QHttpServer, Pillow, nanogear, Apache's Axis2/C, Casablanca (of course!), cpp-netlib, gSoap WebServer, microhttpd (by GNU), libhttpsrver (based on microhttpd), Mongoose, libevhttp. They were (for the most part) fine, sometimes even great, but virtually none of them had C++11 or async file transfer support!

Update: Facebook's Wangle could be a good candidate too, it seems to have a decent async. support (see here), but.... : at that time I didn't find it (definitely a show-stopper ;). At the time of this writing it's got problems compiling on OS X, and I'm not sure if it compiles on Windows at all. There's simply no statement on the project's page what platforms are supported. AFAIK Thrift (or was it Folly?) seems to compile on Windows, but this compiling mess could be a problem.

** as it seems, Concurrency Runtime isn't used anymore in Microsoft's STL implementation (at least in OS's newer than XP) - http://blogs.msdn.com/b/vcblog/archive/2015/07/14/stl-fixes-in-vs-2015-part-2.aspx:
"Using ConcRT was a good idea at the time (2012), but it proved to be more trouble than it was worth.  Now we're using the Windows API directly, which has fixed many bugs."
and:
"... std::mutex on top of ConcRT was so slow!"  
*** I was recently asked by a reader of this post in an email about that problem, and I realized I forgot to mention that in the 1st writeup. Sorry!

Thursday, 4 September 2014

Some more thoughts on the (allegedly decided) Static vs Dynamic debate.


I know, I said the debate were decided - at least for me. But one little uneasy spot remained - the unexplicable attraction to Clojure and all things Lisp. It's not a sweet spot for dynamic languages in general - I'm always sligtly annoyed when I use Python with its exception-based type system, so there's no risk of cognitive bias here.

Last week I stumbled upon "Letter to a young Haskell enthusiast" here.  It's a relatively long post containing several good piece of advice, but what resonated with me at most was the following:
Learn to value all opinions, because they all come from experiences, and all those experiences have something to teach us. Dynamic typing advocates have brought us great leaps in JIT techniques. ... Like you, I would rather write in Haskell. But it is not just the tools that matter but the ideas, and you will find they come from everywhere.
So (paraphrasing) I'd rather write in C++, but when seen in that light, it's just a personal opinion or maybe even a preference. Because:
But be glad that others are charting other paths! Who knows what they will bring back from those explorations.
 That reminds us of being merely humans, with no solid uderpinning of our knowledge other than "it seems to be working" (science) or "it always worked that way" (non-science). So maybe the "unreasonable" usage of dynamic typing can bring us some big gains that are unthinkable as of today. And maybe the unexplicaple attraction to Lisp-y languages is a hunch of this glorious future? ;)

This said, I must reiterate that I can't see any logical (meaning "coming from reasoning", not "if you don't thing that way you are a moron") argument to prefer static typing to dynamic one. But I reiterate it with a much broader and understanding stance ;).

Saturday, 17 May 2014

Named function arguments for C++, again

This is probably/maybe the final installment on my inoffical series on emulating named function parameters in C++ (see here and here). This time, I was inspired by the technique I spotted in the sqlpp11 library (see also the talk here or the slides here).

Want to see it in action? You can call functions like that:
// 2 string parameters
test_func(first_name = "alfa", second_name = "beta");
      
// a string and a bool parameter
test_func_bool("zeta", enable_widgets = true);  

//test_func("alfa"); -- too much, not supported!  
The downside is, that you always must use the names of parameters, i.e. you cannot fall back to positional usage. This is intentional, as to keep the implementation as simple as possible. It's doable, of course, but for what reason? If you don't need name parameter at a given position, don't use it.

So how does the implementation work? It's rather minimal, as ot doesn't try to do all the thing possible. Call me non-genius, but such small code pleases me. Basically, the name argument instance will return an asignment proxy, which remembers the value assigned to it. You say it's unnecessary? And you are totally right, it's only there as to tie a name to the given parameter!
template<typename Lhs, typename Rhs>
struct assignment_t
{
  using lhs_t = Lhs;
  using rhs_t = Rhs;
  lhs_t _lhs;
  rhs_t _rhs;   
};
 
template<typename T>
struct named_arg_t
{
  using type = assignment_t<named_arg_t, T>;
  auto operator=(T arg)
    -> assignment_t<named_arg_t, T>
  {
    return { {}, arg };
  }    
};
try it online: http://www.compileonline.com, you can find the complete code here.

How do you define functions using the named parameters?
// usage string:
named_arg_t<std::string> first_name{};
named_arg_t<std::string> second_name{};
 
void test_func(named_arg_t<std::string>::type first_name, named_arg_t<std::string>::type second_name) 
{
  auto firstname_val = first_name._rhs;
  auto secondname_val = second_name._rhs;
  std::cout << "first name=" << firstname_val 
            << ", second name=" << secondname_val 
            << std::endl;
}
 
// usage bool:
named_arg_t<bool> enable_widgets{};
 
void test_func_bool(const std::string& name, named_arg_t<bool>::type enable_widgets)
{
  if(enable_widgets._rhs)    
  {
    std::cout << "enable widgets for: " << name << std::endl; 
  }
}
You see, there's another caveat - the parameters are still positional, you cannot call the function like that:
// 2 string parameters
test_func(second_name = "beta", first_name = "alfa");
because C++ knows only positional pameters.

I think, this usage is intuitive - you express your wish to have a named parameter (name_arg_t<>) of a gven type (string or bool). Then you acces the parameter's value using a getter in the function - I think it's a minor inconvenience and quite understandable. The naming of the parameter's value is due to discussion, here it documents the implementation mechanism, which admittedly could be not the best solution fo a day to day usage.

But you say - WTF? You say - use Boost.Parameter! Why did I spend my time on those blogpost if there' a solution already? The reason is that in many project you either can't use it (company guidelines), or you wouldn't like to use it (already a big, do-it-all library in place). Another reason is that "small is beautiful"!

Tuesday, 13 May 2014

QString and QByteArray in VisualStudio 2013 Debugger.

Hi all!

Looking for a quick way to visualize Qt's classes in VS 2013? I was too! In my previous projects I used the Qt Add-In, and never thought about it - it was just there! But in the currentl project I don't feel like using the Add-In (several reasons, the main one is that it isn't pre-integrated in the development tols), so I looked for a simple visualizer for the VisualStudio's debugger. To my utter bewilderment, the solution didn't get disclosed in the first link of the Google query!!!??*

What should we do?  Real programmer would write an own visualizer, as described here and here. Or would they? Yes, you are right, he/she wouldn't, he/she would just try harder ;). And I was awarded a reward for my stubbornness: the code for a custom Qt visualizer for both Qt.4 and Qt.5 to be found here!

Unfortunately to the rest of you, the explanations are in russian, so you must try harder I guess ;). OK, don't despair, spending my childhood in a communist country finaly pays out ;) - here comes a short explanation how to create and install the visualizer.

1. Take this code and save it as (for example) qt4and5.natvis :
<?xml version="1.0" encoding="utf-8"?>
<AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010">

 <!-- Qt 4 -->
    <Type Name="QString">
        <DisplayString>{d->data,su}</DisplayString>
        <StringView>d->data,su</StringView>
        <Expand>
            <Item Name="[size]">d-&gt;size</Item>
            <Item Name="[referenced]">d-&gt;ref._q_value</Item>
            <ArrayItems>
                <Size>d-&gt;size</Size>
                <ValuePointer>d->data,c</ValuePointer>
            </ArrayItems>
        </Expand>
    </Type>
 
    <Type Name="QByteArray">
        <DisplayString>{d->data,s}</DisplayString>
        <StringView>d->data,s</StringView>
        <Expand>
            <Item Name="[size]">d-&gt;size</Item>
            <Item Name="[referenced]">d-&gt;ref._q_value</Item>
            <ArrayItems>
                <Size>d-&gt;size</Size>
                <ValuePointer>d->data,c</ValuePointer>
            </ArrayItems>
        </Expand>
    </Type>
 
    <!-- Qt 5
    <Type Name="QString">
        <DisplayString>{((reinterpret_cast&lt;unsigned short*&gt;(d)) + d->offset / 2),sub}</DisplayString>
        <StringView>((reinterpret_cast&lt;unsigned short*&gt;(d)) + d->offset / 2),sub</StringView>
        <Expand>
            <Item Name="[size]">d-&gt;size</Item>
            <Item Name="[referenced]">d-&gt;ref.atomic._q_value</Item>
            <ArrayItems>
                <Size>d-&gt;size</Size>
                <ValuePointer>((reinterpret_cast&lt;unsigned short*&gt;(d)) + d->offset / 2),c</ValuePointer>
            </ArrayItems>
        </Expand>
    </Type>
 
    <Type Name="QByteArray">
        <DisplayString>{((reinterpret_cast&lt;char*&gt;(d)) + d-&gt;offset),sb}</DisplayString>
        <StringView>((reinterpret_cast&lt;char*&gt;(d)) + d-&gt;offset),sb</StringView>
        <Expand>
            <Item Name="[size]">d-&gt;size</Item>
            <Item Name="[referenced]">d-&gt;ref.atomic._q_value</Item>
            <ArrayItems>
                <Size>d-&gt;size</Size>
                <ValuePointer>((reinterpret_cast&lt;char*&gt;(d)) + d-&gt;offset),c</ValuePointer>
            </ArrayItems>
        </Expand>
    </Type>
    -->

</AutoVisualizer>
2. Copy this file to: C:\Users\Your_User_Name_Here\Documents\Visual Studio 2013\Visualizers

3. Restart VisualStudio!

That's all! After that, I'm able to see QString's and QByteArray's contents, and I hope you're able too! At the moment I'm still using Qt 4.8 for the current project, but the switch to Qt 5.2 is pending. When it arrives, I'll just uncomment the Qt.5 section in the file, and will be happy - or at least I hope so.

--
* OK, the solution for VS 2012 + Qt 5.2 does show up, but my constraints are somehow more generic than that.

Thursday, 13 March 2014

Another case for embedded DSLs


As already stated in my previous post I'm not very convinced about embedded DSLs, I cite:
Until recently I'd only sneer at the idea of DSL's: what the @!%*, either we have a solid API for developers, or a specialized higher level language for the specific task. And the higher level language shouldn't be done at all in the best case, because the non-technical user just wants a smooth GUI to click his requests together! So spare your DSL hype on me, go and find another gullible middle management person.
Did you notice the "until recently" moniker? As it turns out, I found some use for E-DSLs in the end, but a general suspicion remained. But recently I happened to hear a presentation by Joel Falcou and that has shown me another use for E-DSLs!

The talk was about:
... using the Boost.Proto library - a C++ EDSL toolkit - to redesign NT2, a C++ scientific computation library similar to MATLAB in term of interface.*
The talk was mainly about template metaprograming, expression templates, and integrating hardware descriptions into the system, but one phrase catched my ear, namely: " similar to MATLAB in term of interface"! Look below for how you could replace MATLAB with NT²:

Recipe:
  • Take a  .m  (i.e. MATLAB ) file, copy it to a  .cpp file
  • Add  #include <nt2/nt2.hpp>  and do cosmetic changes
  • Compile the file and link with libnt2.a
And what are the "cosmetic" changes, s'il vous plait?

MATLAB code:
R = I(: ,: ,1);
G = I(: ,: ,2);
B = I(: ,: ,3);

Y = min ( abs (0.299.* R +0.587.* G +0.114.* B) ,235);
U = min ( abs ( -0.169.*R -0.331.* G +0.5.* B) ,240);
V = min ( abs (0.5.*R -0.419.*G -0.081.* B) ,240);

equivalent NT² code:
auto R = I(_,_ ,1) ;
auto G = I(_,_ ,2) ;
auto B = I(_,_ ,3) ;
table <float , of_size_ <N,M> > Y, U, V;

Y = min ( abs (0.299* R +0.587* G +0.114* B) ,235);
U = min ( abs ( -0.169*R -0.331* G +0.5* B) ,240);
V = min ( abs (0.5*R -0.419*G -0.081* B) ,240);

What struck me, was that you can switch from the proprietary DSP system to a free C++ replacement that simply! OK; NT² isn't a perfect drop-in replacement, but it's good enough if we trade off royalties to be paid against it!

And thus the second use-case for embedded DSLs emerges: lure the user of another language into C++! Or putting it differently: give the user a C++-alternative and let him choose.

BTW: did you notice that the EDSL use-cases always involve the ominous "user", which doesn't really know C++?

--
* Slides for the talk are here.

Sunday, 16 February 2014

Who's to blame? - or the most amazing multithreading bug which wasn't one!



Typically bug-hunting is quite simple, as not to say trivial. But sometimes you encounter some species of bugs that makes you mad for weeks!!! Programming pundits maintain that this are the C++ multithreading bugs, because locks, well..., um..., they just don't compose! So what are we supposed to do? Either we should wait for transactional memory support, or switch to another language, preferably Haskell or Clojure. OK, will do that, but in the meantime there are products to be developed, delivered and sold*. So that's what happened:

1. The Bug

My current project is developed on Windows with Microsoft's Visual Studio as IDE and Intel's XE as a plug-in compiler. We choose Intel XE because it is (unsurprisingly) very good at optimizing code for the Intel platform and, not least, because of its support for the new C++11 standard. Our code is full of boost::thread, boost::mutex, std::atomic's and its ilk, so it falls in the category the gurus are warning of. But - the architecture of the system was recycled from the previous product, so there's no real use in complaining about the non-composability**, and, as we are in the business of streaming files out, the performance is more important than advanced architectural concepts. I just removed a couple of vtable race-conditions from the code and the whole stuff seemed to be running well enough!

But then we changed the Boost version, updated the compiler version, at the same time I made some changes to the threading model to make the internal architecture more dynamic, and booom!, everything stopped working in a huge crash! 


It was "Illegal instruction" and sometimes "Privileged instruction" all the time. What? I'm not issuing assembler instructions in my code! This is definitely something where I can shift the blame on the compiler! On the other side, the C++ standard says that data races constitute the dreaded undefined behavior, in short, that all your bets are off. So why not an illegal instruction? But then there was another type of crash: "Access violation". OK, that's something you could definitely blame on your programming or a data race.

Let me put one thing straight - the crashes came only when the debugger was attached; without debugger the process run without a hitch. But unfortunately this fact doesn't mean very much in the art of debugging multithreading applications - debugger changes program's memory boundaries, ant maybe only accelerates the manifestation of bugs that are otherwise hiding in dark corners of code wanting to become big, ugly Heisenbugs!

First I suspected another vtable race condition and indeed fixed some more instances of it, but that didn't change the overall picture! Next, after inspecting the assembly code following problems were revealed:

First, there were crashes in code like that:
5DF3B12C mov dword ptr [ebp-38h],edi 
5DF3B12F mov dword ptr [ebp-3Ch],esi 
5DF3B132 mov dword ptr [ebp-40h],ebx 
5DF3B135 mov dword ptr [this],ecx 
5DF3B13B mov eax,dword ptr [___security_cookie (5E054DC0h)] 
5DF3B140 xor eax,ebp 
5DF3B142 mov dword ptr [ebp-14h],eax 
5DF3B145 mov dword ptr [ebp-1E0h],eax 
5DF3B14B mov byte ptr [ok],1
where debugger was pointing at mov byte ptr [ok],1 as illegal instruction! What is here illegal! No advance was possible in this case.

But more often I could see something like:
5DF3B152 mov eax,dword ptr [input_size] 
5DF3B155 db 89h 
5DF3B156 test eax,edx 
5DF3B158 db feh 
5DF3B159 db ffh
here the current instruction was, quite understandable, db feh. The problematic part was that before the crash this code looked completely different, i.e. there wasn't a trace of the suspicious db instructions! Well, that way or another, it looked like some dangling pointer problem where memory gets overwritten at some arbitrary address. Or does it? On Windows you cannot overwrite the program section of the process, as there is the code segment protection enabled by default (at least nowadays). Whatever! - I tried to set a data breakpoint ate the changed location, but no change at the given address was reported by the debugger. What the heck, I see that my assembly codes changed, how can debugger be so stupid! So there wasn't any progress on this avenue either.

The second general class of errors was the one causing access violation. It emerged, than an innocuously looking stack pointer manipulation like add esp,0FFFFFFF0h would produce illegal stack pointer value blowing the program off, even though the previous ESP value was well in the allowed range! Looks like sorcery!
5FB4B906  cmp         eax,0BCh
5FB4B90B  jae         5FB4BC25
5FB4B911  add         esp,0FFFFFFF8h
5FB4B914  lea         eax,[conv]
5FB4B91A  mov         dword ptr [esp],2
with register contents like that:
EAX = 322CE9AC EBX = 1BB16DE0 ECX = 359A56A0 EDX = 00000021 ESI = 322CED30 EDI = 322CEBB4 EIP = 5FB4B91A ESP = FFFFFFFC EBP = 322CEB84 EFL = 00010286
Well, with stack pointer corrupted, the result of this was:


Here's another beauty - debugger reported a stack corruption here, as it was completely bewildered with what was going on:



So how was this bug to be handled, and who shall win the blame game?

2. The Explanation.

I tried to inspect the memory directly in the memory window of the Visual Studio debugger, but it was somehow too low lever for my mood at that moment. So because:
  • the problems always occurred in the vicinity of a break point,
  • the debugger was always displaying correct values of the operands in the "Access violation" case,
  • the debugger was never showing the illegal instructions, only after the crash happened,
I decided that the single explanation left was that it's something to do with the tools. Therefore I duly reinstalled everything in my toolchain, Visual Studio, Intel compiler, Boost libraries, in the vague hope that the installation of some of them could be broken. Alas, nothing seemed to help.

At this point, I declared the issue as solved - I had some real work to be done, and if you didn't define excessively many breakpoints, the debugger typically played to the rules. Except when it really mattered of course (sigh).

I explained the problem to my worthy colleague Bernhard Z. so he could take it up, and he investigate it a little bit further. And indeed - he found out some very interesting facts about how exactly the debugger has been guilty!!! 

Contrary to what Visual Studio is telling you, it is possible to attach to a running process with two debuggers (!!!). It is done like that - first attach to it with Studio, and then start WinDebug (don't forget to check the "Nonintrusive" option, or it will complain!). Now, Bernhard did it, and he could have an independant view on the machine code!

An than, with the second view on the assembly code, the problem became obvious - the debugger inserted the debug-break instruction (int3 <=> 0xcc) at the wrong position in the machine code - not before the opcode, but 2 bytes behind it, making the arguments following the opcode to illegal instructions! Look here at the assemble code before setting the breakpoint (the line where the breakpoint has to be set is in bold):
00c7461f 89542404        mov     dword ptr [esp+4],edx
00c74623 e888d00000      call    BoostSerialize!save_data_xml (00c816b0)
00c74628 83c410          add     esp,10h
00c7462b 83c4f0          add     esp,0FFFFFFF0h
00c7462e 8d9560ffffff    lea     edx,[ebp-0A0h]
00c74634 83bd74ffffff10  cmp     dword ptr [ebp-8Ch],10h
00c7463b 8d85c0fdffff    lea     eax,[ebp-240h]
00c74641 0f439560ffffff  cmovae  edx,dword ptr [ebp-0A0h]
00c74648 890424          mov     dword ptr [esp],eax
00c7464b 89542404        mov     dword ptr [esp+4],edx
00c7464f e88cca0000      call    BoostSerialize!save_data_binary (00c810e0)
00c74654 83c410          add     esp,10h
00c74657 83c4f0          add     esp,0FFFFFFF0h
00c7465a 8d9560ffffff    lea     edx,[ebp-0A0h]
00c74660 83bd74ffffff10  cmp     dword ptr [ebp-8Ch],10h
00c74667 8d85c0fdffff    lea     eax,[ebp-240h]
00c7466d 0f439560ffffff  cmovae  edx,dword ptr [ebp-0A0h]
00c74674 890424          mov     dword ptr [esp],eax
00c74677 89542404        mov     dword ptr [esp+4],edx
00c7467b e8d0b90000      call    BoostSerialize!save_data_string (00c80050)
And now the assembler code after the breakpoint has been set:
00c74623 e888d00000      call    BoostSerialize!save_data_xml (00c816b0)
00c74628 83c410          add     esp,10h
00c7462b 83c4f0          add     esp,0FFFFFFF0h
00c7462e 8d9560ffffff    lea     edx,[ebp-0A0h]
00c74634 83bd74ffffff10  cmp     dword ptr [ebp-8Ch],10h
00c7463b 8d85c0fdffff    lea     eax,[ebp-240h]
00c74641 0f43cc          cmovae  ecx,esp
00c74644 60              pushad
00c74645 ff              ???
00c74646 ff              ???
00c74647 ff8904248954    dec     dword ptr [ecx+54892404h]
00c7464d 2404            and     al,4
00c7464f e88cca0000      call    BoostSerialize!save_data_binary (00c810e0)
00c74654 83c410          add     esp,10h
00c74657 83c4f0          add     esp,0FFFFFFF0h
00c7465a 8d9560ffffff    lea     edx,[ebp-0A0h]
00c74660 83bd74ffffff10  cmp     dword ptr [ebp-8Ch],10h
00c74667 8d85c0fdffff    lea     eax,[ebp-240h]
00c7466d 0f439560ffffff  cmovae  edx,dword ptr [ebp-0A0h]
00c74674 890424          mov     dword ptr [esp],eax
00c74677 89542404        mov     dword ptr [esp+4],edx
00c7467b e8d0b90000      call    BoostSerialize!save_data_string (00c80050)
Do you see it now? In Visual Studio's debugger you wouldn't see anything because Visual Studio hides debug-breaks in the disassembly view! That's the typical Microsoft attitude - we know better what you'd like to see - and they are right with it in about 95%. ... until it bites you!

The second type of crash can now be explained too: 0xcc overwrote the correct operand of the add esp instruction, the addition overflowed, and then we got the access violation!

A "trouble ticket" was issued to Intel: http://software.intel.com/en-us/forums/topic/489007. They didn't believe it of course...

3. The moral of the story

Only after that all became clear we could positively be sure that this wasn't a multithreading bug, and were able to continue the development as before ;).

The moral is simple, and already well known - sometimes you can blame it on your tools. 
Plus perhaps another well known one - you always need a second opinion (i.e. WinDebug's)!

And because it seems to be a good time for some bragging, it wasn't the only "you can blame it on your tools" moment in my programming career - a couple of year ago I diagnosed broken multithreaded exceptions/stack unwinding implementation in a port of the gnu compiler to some obscure UNIX derivative (won't tell the names, however, but it was rather a big Telco company). As a result the whole project had to disable exceptions in the compiler. Don't believe it's possible young Jedi? Yes, it is (or it was at that time - I never needed it again).

--
* some discussion about why locking & mutexes are the scapegoats of programming pundits can be found in the "Is Parallel Programming Hard, And, If So, What Can You Do About It?" book here

** for myself I'd opt for a share-nothing, message-passing, "worker threads"-oriented architecture - but well, nobody asked ;). And that post about "Threads as Workers" isn't still written anyway.