Sunday 20 September 2015

HTTPS Support for a Casablanca Server and Certificate Error 1312


In the previous post about Casablanca C++ REST framework I said that our next task would be adding HTTPS support to our solution. So let's do it.

1. The principle

We start with the server side. On the surface it is rather simple, you just specify the protocol the server is using to be "https://" and voila, everything is done. This is done for example like that:
  _restListener = http_listener(U("https://localhost:8111/test"));
But if we peek into the code, we'll see that server-side HTTPS is only supported for Windows as for now (Casablanca 2.5.0). OK, let's test it! We start the server, let Chrome get some data from the newly created SSL-URL*, and... it won't work, apparently there's no such URL! What? We already started our server there, and it's running.

After a second peek into Casablanca's code, we learn that it's leveraging Windows HTTP Server API, but does not configure any HTTPS support by itself. Then we learn from MSN docs, that this will be usually done in the configuration step, before the HTTP server is started. Apparently a programmatic solution is possible too, bu we start with the simple one (or so we think...)**.

In short, we have to attach a server certificate to a port on the machine where our server runs (i.e. our developer machine, at least in this post). For that, we first have to create a new certificate.

2. First try

For the sake of our tests we want to create a self-signed certificate. On Windows we can use the MakeCert.exe tool for that. So I opened the VisualStudio developer console and typed***:

  makecert.exe -sr LocalMachine -ss Root -a sha512 -n "CN=my_machine_name" -sky exchange -pe -r

as to create a self-signed (-r) cert for local machine (required for the add sslcert step later), export it's private key (-pe), use if for exchange (-sky) and store it straight away into the trusted Root Certification Authorities store. OK, it worked.

Next we open the MMC, check that the certificate is there and it has no errors. Don't know how? It's explained several times on the Web, for example here.

Now we need only to set the certificate for server's IP address. The command for that is :

  netsh http add sslcert ipport=0.0.0.0:8111 certhash=XXXX appid={yyyyy}

For the certhash we can use the thumbprint from certificate's Properties window in MMC, for the application ID the most frequent advice is to use the ID of the IIS Server, but as we don not even have it installed on our machine, we prefer to use an empty GUID: {00000000-0000-0000-0000-000000000000}, it's allowed too! Should be working OK!

Alas the dreaded Error 1312 showed it's ugly head:

  Certificate add failed, Error: 1312 A specified logon session does not exist. It may already have been terminated.

WTF? What is that supposed to mean? There are many reasons, like bad certificate hash (nope, double-checked it), a missing Windows hot-fix (nope, it's 2010, already installed), not exported private key (nope, we did it), some mysterious cases when certificates get renewed (nope, there are fresh here), etc, etc. But definitely not a problem with a logon session, that much is true!

3. The second try

What should we do? As always the advice is: go back to the last known working setup. As to do this, I followed advice in this blogpost quite meticulously, and created two certificates:

  makecert -sk testRootCA -sky signature -sr localmachine -n "CN=RootTrustedCA" -ss TRUST -r RootTrustedCA.cer

  makecert -sk testServer -ss MY -sky exchange -sr localmachine -n "CN=Server" -ic RootTrustedCA.cer -is TRUST Server.cer -pe


The first one is supposed to be the base and trusted one, the second to reference the first. That didn't cut it either, because the RootTrustedCA still wasn't trusted - I had to add it to the Root store using MMC (don't know how? - look here).

This time, I didn't want to type the certificate hash and used this nice tool to apply the certificate to the desired IP address, and guess what - it worked! The last problem was, that the domain and certificate's names didn't match, so I created a cert with my machine's name:

  makecert -sk -ss MY -sky exchange -sr localmachine -n "CN=my_machine_name" -ic RootTrustedCA.cer -is TRUST my_machine_name.cer -pe

 applied it to 0.0.0.0:8111 and, at last, everything worked like a charm:




4. Conclusion

As it seems, add sslcert (an by extension, Windows' HTTP Server API) only supports certificates which reference a base trusted one, and doesn't like the ones from the Root store. Remember: the certificate has to be in the User store and has to be in the computer account (i.e. for the local machine)!

Update: The above conclusion is in principle correct, but only when we the add two small words: "by default"! As explained in a friendly comment, add sslcert accepts a command line switch where the desired certificate store can be specified!

OK, that was a little tiring, we'll add support for the client side in another post.

PS: Later I had to add configuration of the SSL certificates to our server configuration code, lots of Windows  system programming :-/.

--
* I mean TSL, but for historical reasons....

** Of course in the real server we'd will implement programmatic solution for configuration of the SSL options for a given URL. You may look up the HTTP Server API's functions HttpSetServiceConfiguration and HTTPQueryServiceConfiguration

*** not quite, first I tried to create a private certificate referencing an already existing, trusted one with:

  makecert.exe -sr LocalMachine -ss MY -a sha512 -n "CN=my_machine_name" my_machine_name -sky exchange -pe -eka 1.3.6.1.5.5.7.3.1

here the implicit Root Agency certificate is used. The problem here is, that the created certificate was broken, MMC said: "This certificate has an invalid digital signature". After much ado it turned out, that the referenced  Root Agency certificate uses a public key which is too short (only 512 bits!!!) and is thus considered not secure. Thanks for a very informative error message!


Casablanca C++ REST Framework - One Year Later.


Approximately one year ago I started a new project for one of my customers with the aim of adding a REST interface to their redesigned product. The new release should transform their as yet desktop-only and Windows-only application into a cross-platform, distributed, client-server one, using an HTTP API for communication.

The technologies to be used were Qt, C++11, Windows and Apple's OS X. The question was how to implement the REST interface.

The Setup

After having investigating some choices*, I settled for Microsoft's C++ REST SDK code-named Casablanca. It was cross-platform (sic!), used modern C++ constructs (C++11, you've known that I'd like that!) and was open source (sic!). Sounds like it wasn't Microsoft, but I think, we still need some time to get used to the new Microsoft.

There were some problems, though. The client choose Qt 5 framework for portability, and initially I was worried if Casablanca would play well with Qt's "I am the world" attitude, it's message pump and threading model. Moreover, the server implementation resides in an "experimental" namespace, which normally isn't a good sign either!

On the positive's side there was JSON support and nice asynchronous file transfer implementation based on the PPLX tasks (on Windows, for Linux, OS X, etc. Microsoft wrote a port). This was a big one, as the main functionality of the server will be processing files, and the input files will be mostly uploaded from other machines. And of course the biggest one - it's open source!

So the endeavor was not without risk! What can I say after a year give or take a couple of months after we started?

Highlights

One of the highlights is of course the task-based implementation of asynchronous processing used in Casablanca's API, like here, in the already mentioned, built-in support for file transfers:
CCasablancaFileTransfer::Task CCasablancaFileTransfer::StartFileDownload(const QString& downloadUrl, const QString& localFilepath) const
{
  using concurrency::streams::istream;
  using concurrency::streams::streambuf;
  using concurrency::streams::file_buffer;

  web::http::uri url(downloadUrl.ToStdString());
  web::http::client::http_client client(url);

  web::http::http_request getRequest(web::http::methods::GET);
  getRequest.headers().add(web::http::header_names::accept, "application/octet-stream"); 

  return client.request(getRequest)
    .then([=](pplx::task<web::http::http_response> previousTask)
  {
    try
    {
      auto response = previousTask.get();

      if (response.status_code() != web::http::status_codes::OK)
      {
        QString errTxt = ".....";
        return pplx::task_from_result(std::make_pair(false, errTxt));
      }

      try
      { 
        streambuf<uint8_t> localFile = file_buffer<uint8_t>::open(localFilepath.ToStdString()).get();

        return response.body().read_to_end(localFile)
          .then([=](pplx::task<size_t> previousTask)
        {
          streambuf<uint8_t>& nonconstFile = const_cast<streambuf<uint8_t>&>(localFile);
          nonconstFile.close().get();

          // ETag?
          QString maybeEtag = ftutil::FindHeader(response, web::http::header_names::etag);

          return pplx::task_from_result(std::make_pair(true, maybeEtag));
        });
      }
      catch (...)
      {
        return TranslateFileException(localFilepath);
      }
    }
    catch (...)
    {
      return TranslateWebException();
    }   
  });
}
Please notice that each block following a .then()will be executed asynchronously, in a separately scheduled thread (or task), when the preceding step will finish! You can do the same on the client side of course. Alternatively you can force blocking processing of a task by calling its get() method.

If you like comparisons, you may have a look at Facebook's Futures in fbthrift, They are generally working like Casablanca's tasks, but ave an additional nice onError() clause and even the possibility to choose a specific executor!

Note: I won't give here an introduction to Casablanca, the basic usage was explained several times on the Web (look here and here for basic client usage, here for basic server example, and here for file transfers). However, what I found is missing in all intro material I've seen, is the mention of exception propagation between asynchronous tasks. The problem is that a thrown exception has to be "observed" by library user, and if it won't be observed, Casablanca will "fail fast", i.e. take down the server in the destructor of the task that trew. Surprise, surprise, your server is crashing! An exception is observed (i.e. marked as such and then rethrown) if the .get() or wait() methods are called for that task or one of its continuations. So be cautious! The above code thus needs additional try-catch clause around the final .get() call, but I omitted it for sake of simplicity...

So it didn't take long, and I could announce:

Problems

1. The only really big problem that hit us was the performance. The trouble was that after a couple of thousands of file uploads or downloads server performance tumbled into a free fall: the same basic polling which normally had taken 2-3% of the CPU time surged to 20-30% after that! That was a real show stopper at first. And of course it wasn't our code that showed up in the profiler, it was some of Casablanca internals!

It took me about 2 weeks to investigate that, and maybe I'll write a more detailed post about it some time, but for now it suffices to say that Windows Concurrency Runtime (i.e. the native, task-based implementation of PPL) was left with an ever-growing internal list, which was sequentially scanned with each tick of the scheduler - at least in our environment of Windows 7 and Visual Studio 2013 plus an early Casablanca version (1.2.0).

I reported it to Microsoft, and they reacted pretty quickly - the next release has been given a define (CPPREST_FORCE_PPLX) to disable Concurrency Runtime and switch over to Windows Threadpool based implementation of PPL. I got the latest version from the development branch, tested it, and voila! our performance problems vanished into thin air. We then waited for the next release (2.5.0) and when it came out, we upgraded our code to it and suddenly everything worked like a charm. BTW, for the Casablanca version for Visual Studio 2015 Windows Threadpool is the default setting (or so I was told)**.

2. Other problem was lack of built-in CORS support, I had to implement t myself. Wasn't that difficult, but it's bound to our mixed Casablanca/Qt environment, so unfortunately I couldn't contribute it back to the project.

3. Then there were Qt-specific problems, first of them Qt's notorious usage of the /Zc:wchart_t- flag on Windows. This means that Qt uses for wide character not a native type (as the standard would require) but a typedef to unsigned short. Thus you won't be able to link Casablanca to your Qt-compliant project, because std::string will be mangled to $basic_string@GU by Qt (and your code) but to $basic_string@_WU by standard Casablanca build.

The remedy is to build Casablanca on your own (i.e. not to use the NuGet packet) with /Zc:wchart_t-. This will first fail with an error complaining about double definition for wchar_t, but then uncommenting the redundant definition will suffice. Hmm, when I think of this, I should probably contribute this back to Casablanca with some #ifdef combination...

4. Another Qt-related problem I stumbled upon was nothing more than a new type of deadlock (at least for me). I'll dub it the "big signal-mutex confusion deadlock". It's quite interesting though: imagine that your handler does one half of the work in a Casablanca thread but the second one in the main Qt-thread. Then use some lock to protect a resource from parallel access.

Now imagine, you lock the mutex in a Casablanca part of the GET handler and emit a signal to continue processing in Qt context. Now another handler (e.g. DELETE) didn't need to lock in the Casablanca context, already emitted a signal and is now in the Qt context trying to lock the mutex. The problem is, the second part of the GET handler will never execute, as the message pump is blocked with waiting for the mutex, which can be unlocked only by the next signal to come - deadlock. Mind-boggling? Well, threads are hard. Remedy: always lock the resource in Casablanca part of the handler, even if you don't need it.

Admittedly, the problem looks somehow artificial, but that's a consequence of the somehow inane requirement, that the requests should be finally processed in the Qt-context, a requirement originating from another part of the system, which I won't discuss or disparage here.

5. A minor problem was timestamp comparison using Casablanca's utility::datetime class:
int CBasicLastModifiedMap::CompareSec(const utility::datetime& lhs, const utility::datetime& rhs)
{
#if _DEBUG
 // TEST:::
  auto lhsStrg = lhs.to_string();
  auto rhsStrg = rhs.to_string();
#endif

  int timestampDiffSec = lhs - rhs; // truncates up to seconds!

  if (timestampDiffSec == 0)
    return 0;
  else 
    //extra check, because timestampDiffSec always > 0 (Casablanca problem!)
    if (lhs.to_interval() > rhs.to_interval()) 
      return 1;
  else
    return -1;
}
Bug or feature? - decide by yourself.

6. One (minor?) problem we encoundered was specifying all interfaces for the server to bind on. Normally you'd expect to be able to use "0.0.0.0", but in Casablanca rejects it. After consulting the source code the solution was clear: just use "*" instead!***

Conclusion

Otherwise: we are quite happy!!! The system works cross-platform, the performance is good, no apparent problems there.

Well, no problems till now. My client didn't specify any security for the first version of the product, they assumed (quite reasonably) that the customers will use the new product only inside of their trusted network at first. Neither is any notion of client identification or client roles planned, nor are we using gzipping of the HTTP data. Thus the more advanced features of a HTTP server weren't required.

As the next step towards more complicated scenarios, we'll look at HTTPS support in Casablanca. See you than...

Update:

OK, we actually found one weird bug while testing: "http_listener crashes when URI contains a pending square bracket" (bug report + proposed fix here). I had to patch it locally for the time being, but as it seems, it'll be fixed in version 2.6. Weird, other misconstructed URLs are rejected OK, only square brackets generate crashes. 
Update 2: same problem with square brackets in HTTP parameters, only I didn't resolved it yet (no time, a low prio bug). So story continues.
Update 3: The above problem resolved, see my code comments for explanation:
  // As of Casablanca 2.5 http_request::relative_uri() will throw exception if it encounters (correctly) encoded
  // "[", "]" or "#" characters in the query parameters!
  //  - workaround: try to extract the relative path by hand (message.absolute_uri().path() works)

Another 
Update: (07 Sep. 2016)

I said, we were overall happy with Casablanca, but there's a new problem, which probably could be pretty grave. Namely, Casablanca server tends to crash sometimes. In normal operation it's very seldom, but if there is a severe overload, it may happen rather more often :(. At first I thought another parts of code would have some dangling pointers, and put this error into a waiting room.

Recently I took some of my time to analyze it, and it seems to be a genuine Casablanca problem, which is somehow connected to handling of timed-out client connections (as it seems in the moment). In Release 2.8.0 there was a pull request that refactored this part of the code to remove some race conditions, but I'm not sure if this was sufficient to fix the crash...

As soon as I solved that, I'll blog. For the moment take heed, the server part is still in the "experimental" namespace (as of 2.8.0) and:
"The http_listener code is the least tested code in the library despite its age, so there are certainly a lot of issues here that need fixing."
--
* like: QtWebApp (by Stefan Frings), Qxt's WebModule, Tufao, QHttpServer, Pillow, nanogear, Apache's Axis2/C, Casablanca (of course!), cpp-netlib, gSoap WebServer, microhttpd (by GNU), libhttpsrver (based on microhttpd), Mongoose, libevhttp. They were (for the most part) fine, sometimes even great, but virtually none of them had C++11 or async file transfer support!

Update: Facebook's Wangle could be a good candidate too, it seems to have a decent async. support (see here), but.... : at that time I didn't find it (definitely a show-stopper ;). At the time of this writing it's got problems compiling on OS X, and I'm not sure if it compiles on Windows at all. There's simply no statement on the project's page what platforms are supported. AFAIK Thrift (or was it Folly?) seems to compile on Windows, but this compiling mess could be a problem.

** as it seems, Concurrency Runtime isn't used anymore in Microsoft's STL implementation (at least in OS's newer than XP) - http://blogs.msdn.com/b/vcblog/archive/2015/07/14/stl-fixes-in-vs-2015-part-2.aspx:
"Using ConcRT was a good idea at the time (2012), but it proved to be more trouble than it was worth.  Now we're using the Windows API directly, which has fixed many bugs."
and:
"... std::mutex on top of ConcRT was so slow!"  
*** I was recently asked by a reader of this post in an email about that problem, and I realized I forgot to mention that in the 1st writeup. Sorry!