Sunday, 29 November 2015

OCaml and Multithreading


As Doctor House said one day: "There was that philosopher Ocaml..." Stop, wrong start! Try again.

1. OCaml story

OK, some time ago, in a somehow better world, I became attracted to the OCaml language because of its speed, elegance, light syntax, algebraic data structures and automatic type inference (as you meanwhile might know, I'm a bit of a typing fan). I must admit, it was my first functional language, so maybe I'm a little sentimental about it. Later on I somehow didn't give it much love and affection, mostly because I got distracted by other, more fashionable languages like Clojure, F# and Haskell (shame on me!).

Nonetheless, although this one is rather an obscure language, one company has been using it big time to do some highly parallel, high-performance data processing. This company is Jane Street and they are a financial market player. The reason they chose OCaml were that the owners wanted to read every line of code performing financial transactions*:
"Early on, a couple of the most senior traders (including one of the founders) committed to reading every line of code that went into the core trading systems, before those systems went into production"
"In 2003, Jane Street began a rewrite of its core trading systems in Java. The rewrite was eventually abandoned, in part because the resulting code was too difficult to read and reason about—far more difficult, indeed, than the VBA that was being replaced."
 "The VBA code was written in a terse, straight-ahead style that was fairly easy to follow. But somehow when coding in Java we built up a nest of classes that left people scratching their heads when they wanted to understand..."
 "In 2005, emboldened by the success of the research group, Jane Street initiated another rewrite of its core trading systems, this time in OCaml."
So somehow the functional-style code was more readable than a pile of Java classes: can we put that to the superiority of functional languages in general or just to bad training of Java programmers? An open question.

OK, there are additional technical reasons as well: the type inference that makes the code concise (thus more readable) while retaining type safety. Additionally it's runtime performance is pretty good! A double win against for example Python.

2. Multithreading?

For a long time they remained the single serious industry user of which I knew, but recently also Facebook started using OCaml in their Hack and Flow language typing add-ons for PHP and Javascript implementations. Surprise! And a very nice one, my pet language got appreciated at last! Everything OK?

Not quite: when I listened to that presentation, I was surprised to learn that OCaml's mutlithreading support is pretty much completely broken. How can that be? Why didn't I notice that first place?



Well, after some more digging, I received this link from a fellow programmer. It makes it pretty clear to us all that OCaml designers were quite a nasty bunch of "multicore deniers":
"The goals of OCaml threads are (2) and (3) but not (1)"
Where (1) denotes "Parallelism on shared-memory multiprocessors"! And then:
"Shared-memory multiprocessors have never really "taken off", at least in the general public.  For large parallel computations, clusters (distributed-memory systems) are the norm. For desktop use, monoprocessors are plenty fast."
 "What about hyperthreading?  Well, I believe it's the last convulsive movement of SMP's corpse :-)"
Yes, they didn't like it, to say the least. But there's another reason as well: the GC implementation in OCaml is single-threaded and thus it could be made very fast, which was an important factor in OCaml's initial successes. But then this (ironically) proved detrimental to trying to make the language MT-safe**.

The only way to do parallel processing in OCaml is thus multiprocessing + message passing. As I learned Jane Street wrote the Parallel library to support that:
"Parallel is a library for spawning processes on a cluster of machines, and passing typed messages between them. The aim is to make using another processes as easy as possible. Parallel was built to take advantage of multicore computers in OCaml, which can't use threads for parallelism due to it's non reentrant runtime
The Facebook people went one step further: they
"...use the same model for multithreading: a specially mmap'd region shared between different fork'd processes, containing a shared, lockless hash table. "
Thus they have multithreading without threads - a couple of processes sharing the memory space (or a part of it), faking the SMP model! You must admit it's rather brilliant - you spare yourself all that serialization, deserialization, and message passing that normally shave off a significant part of the performance.

3. So why OCaml?

Apparently, despite the mutithreading disaster, companies find OCaml worth trying out. Why? I'd say the reasons are:

  1. It supports functional style (but it allows dirty tricks when they must be done)
  2. Has strong typing unlike Lisp, Python, etc.
  3. Not bound to Windows and .NET like F#
  4. Not Haskell, you'll be able to read and understand it.***

And finally, maybe this whole multithreading business isn't worth the hassle? Facebook clearly need it, but Jane Street don't. I remember having heard Yaron Minsky (of Jane Street) saying somewhere that parallel processing plus message passing is a much safer model that multithreading. And in finance you need that safety.

I can accept this argument in a more general setting as well, because the only reason we have threads are performance gains (OK, a deceptively "natural" programming model too). And if the performance of forking + message passing is sufficient, that's definitely a gain! Jane Street does some massive parallel processing in real time, so the performance seems to be not that bad after all...

Or maybe the fork+pass model is slower, but they are gaining on the excellent low-latency CG? Jon Harrop puts it like that:
"OCaml has a nice (~10ms) low latency GC and it is easy to optimise OCaml code for low latency. In contrast, the .NET GC is much more complicated and, therefore, harder to optimise for and, in my experience, has >10x higher latency. I suspect that would be a major drawback for Jane Street."
So before you port OCaml's GC to multicore, overcomplicating it and losing its excellent performance, maybe paying some price for the workarounds pays out better?

Having said that, there's recent work on improving the GC-performance even more, and even a new attempt on multicore OCaml. Maybe this time they will be successful. And be wary, not everyone likes OCaml, for some "OCAML sucks" - http://www.podval.org/~sds/ocaml-sucks.html!

Update: an impressive list of companies using OCaml: https://ocaml.org/learn/companies.html (via @jonharrop). So it's not only Jane Street and Facebook!
Update2: Facebook continues working on OCaml ecosystem - Reason, a "new interface to OCaml" ... "provides a new syntax and toolchain for editing, building, and sharing code". Plus Infer - the iOS/Android app checking tool!

--
* "OCaml for the masses" - http://queue.acm.org/detail.cfm?id=2038036

** "The rise and fall of OCaml" - http://flyingfrogblog.blogspot.co.uk/2010/08/rise-and-fall-of-ocaml.html

 ***  As someone said, Haskell was created to do research on Haskell (and OCaml was created to write theorem provers).


Thursday, 12 November 2015

Monads, who's afraid of Monads (and C++)?


Nowadays every programmer has heard of Monads, and either has a PhD in CS or is afraid of them (or both...). When I first heard of it (my university years are quite distant, and the category theory wasn't en vogue yet in that distant time) I though: "Monadology? Leibniz? WTF? Didn't Kant disprove that?"

Because writing a monad tutorial is sooo last decade, I will just reference one quite good, practical introduction to that stuff that is using Javascript (Javascript! => nice, no need to learn a new, unreadable & incomprehensible language first! 😀). I think it's <praise> practical enough to be useful, but then again not shirking back from maths in order to please the reader</praise>.

If you don't want to read it, and are smug enough to think that a 10.000 feet view is all you'll ever need, voila - there's a short summary:

Monads Summary

1. Functor

An object encapsulating a value, plus an interface (called "fmap" for historical reasons) taking some function to act on that value. Because of the immutability/pure functions stuff it doesn't change the contained value, of course, but produces a new functor instance.
            +------------+                                  +------------+
            |   value    |                                  |  newvalue  |
      f     |            |     fmap()                       |            |
  +-------> |   fmap()   |  +--------->  f(value) +-------> |   fmap()   |
            |            |                                  |            |
            +------------+                                  +------------+
Is the picture clear enough? OK, let's continue.

2. Monad

Is a Functor which has a "collapse" and "wrap" methods. Collapse (also called "join" in some contexts) will remove one level of the packaging added by the functor, "wrap" (also called "return") will remove it. Simple like that!
  +-------------------------------+                               
  |                               |                               
  |            +-------------+    |                 +------------+
  |            |             |    |       join      |            |
  |   value =  |    value    |    |   +---------->  |   value    |
  |            |             |    |                 |            |
  |            +-------------+    |                 +------------+
  |                               |                               
  +-------------------------------+                               

                         +------------+
               wrap      |            |
   value   +---------->  |   value    |
                         |            |
                         +------------+
Well not quite, the classic usage case** that made the whole concept famous, hinges on another method called "mbind" which allows us to apply several functions to the monadic value in a row, an emulation of imperative programming (imperative always means in this context "evil, evil") standard sequential flow of control. But wait a minute: isn't that simply "fmap" + "join"? Yeah, you guessed it, it is***.

Simple? Yes, I think so. Was it that complicated? No. So why all that fuss? Don't know, won't comment here, make your own mind.

But why should this be useful? Again, I don't want to write a monad tutorial here (it's so ...), read the Javascript  book, or maybe some general functional programming resources. Start with error handling and the Maybe type.

High level enough? If not, take on with this classic explanation: "Monad is just a monoid in category of endofunctors, what's the problem?"*. Here you'll need some category theory/maths, there's no mercy:

Another Summary

1. Endofunctor

A functor (see above) mapping values to values of the same type. On that level, functor is a mapping between categories, where a category is a set of values plus all the functions between the values. Thus our functor above maps values of type A to values of type B and functions A->A to functions B->B, respecting some common sense restrictions on its way.

You see, it's getting pretty messy quite quickly, but take it as a mathematical recreation and follow through.

2. Category of endofunctors

All endofunctors plus functions mapping one endofuctor (i.e. a functor, i.e a mapping from one "set" of object+mappings to another) to another.

3. Monoid in that category...

Monoid is a "set" of objects having a function A->A defined for them (usually called "m-append", you get it, basically a glorified string or array) plus a neutral element. Now just imagine that for the category of endofunctors, do some math, an you will see that all that gunk collapses to our old, trusty monad as described above.

Why so complicated? Well, that's maths (and mathematicians) for you. But don't despair, it's only a construction of a human mind, so everyone can get a grip of it. The question is only if that does feel like fun for you or not. In all cases, don't mythologize!

And That's It!

Now if you feel like that, you can try out this things in C++, there are a couple libraries implementing this stuff like thisthis or that. Simple, self-sufficient implementation is possible too! There was even a C++ language proposal for a monadic expected class to be used for error handling, so somehow this stuff is coming. Now as you are not intimidated anymore, go and try it!

But first read the Javascript online book to familiarize yourself with the concepts. And yes, you don't need to learn Haskell**** and then read Haskell books for that! So do not fear.

Maybe you'd even like to go further and investigate monadic parsers (because I like F# you'll get this link and this link), applicatives & applicative parsers (sorry no link here, didn't find anything which is simple enough) or free monads. Who knows, the sky is the limit now, an you do not fear anything! At least you can read the jargon without cold shivers running down your spine.

PS: Maths guys, please no hair-splitting, I didn't cross-checked it with MacLane & Avodey, just wrote it down as I remembered it. However notice the "set" usage ;) ...


Update: If you want to continue in a similar vein, that's an interesting page demythologizing func. prog. on its own: github.hemanth.functional-programming-jargon!

--
* famously attributed to Phillip Wadler, but it's rather a joke, as the real sentence was:
"All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor."
** by..., you guessed it, Phil Wadler: "Monads for functional programming" (it's CS + FP but nonetheless quite readable, you may risk a try now when you fear nothing!)

*** in some contexts "point" and "flatMap" are used, the first for "wrap", the second for "fmap" + "collapse"

**** "I'm Haskell, the language of purebloods..." 😉

Friday, 23 October 2015

Switch() Statement for Types in C++


I saw this question on Stack Overflow some times ago, it amounted to emulating a switch() statement in template code. Then recently I saw a nice example of doing this.

First, the oldskool style, using Boost and C++98:
  template <int N>
  struct shortest_fitting_int
  {
    BOOST_STATIC_ASSERT(N > 0);
    BOOST_STATIC_ASSERT(N <= sizeof(int64_t));
 
    typedef
      typename boost::conditional< (N <= sizeof(int8_t)),
        int8_t,
        typename boost::conditional< (N <= sizeof(int16_t)),
          int16_t,
          typename boost::conditional< (N <= sizeof(int32_t)),
            int32_t,
            int64_t
          >::type
        >::type
      >::type type;
  };
Note the the usage of BOOST_STATIC_ASSERT() and the ubiqutous ::type suffix!

Now have a look at the "modern" C++11 style:
  template <int N>
  struct shortest_fitting_int
  {
    static_assert(N > 0, "negative N");
    static_assert(N <= sizeof(int64_t), "N > 8");
 
    typedef
      std::conditional_t< (N <= sizeof(int8_t)),
        int8_t,
        std::conditional_t< (N <= sizeof(int16_t)),
          int16_t,
          std::conditional_t< (N <= sizeof(int32_t)),
            int32_t,
            int64_t
          >
        >
      > type;
  };
Did you notice how much more readable the code is? The small, innocent-looking C++11 innovation of using XXX_t templates (like void_t, enable_if_t and their ilk) removes so much of clutter! Nice one!

Source: Andrzej's C++ blog: Handling short codes — part I.

Friday, 16 October 2015

Cool usecase for std::tie()


Until now I only thought of std::tie()as useful for decomposing several return values of a function that is faking multiple return values by using std:tuple:
  bool a; 
  SomeData b; 

  std:tie(a, b) = get_a_b_as_tuple();
But now look at that:
  structS 
  {
    int n;
    std::string s;
    float d;

    bool operator<(const S& rhs) const {
      // compares n to rhs.n,
      // then s to rhs.s,
      // then d to rhs.d
      return std::tie(n, s, d) < std::tie(rhs.n, rhs.s, rhs.d);
    }
  };
Nice, isn't it?

Source: CppCon2015 / Presentations / Simple Extensible Pattern Matching With C++14.

Update:
As it seems, this technique is considered to be pretty standard now, look here. If you are interested in TMP (and param pack expansions ;) you'll find there some techniques to make such comparators more flexible.

Sunday, 20 September 2015

HTTPS Support for a Casablanca Server and Certificate Error 1312


In the previous post about Casablanca C++ REST framework I said that our next task would be adding HTTPS support to our solution. So let's do it.

1. The principle

We start with the server side. On the surface it is rather simple, you just specify the protocol the server is using to be "https://" and voila, everything is done. This is done for example like that:
  _restListener = http_listener(U("https://localhost:8111/test"));
But if we peek into the code, we'll see that server-side HTTPS is only supported for Windows as for now (Casablanca 2.5.0). OK, let's test it! We start the server, let Chrome get some data from the newly created SSL-URL*, and... it won't work, apparently there's no such URL! What? We already started our server there, and it's running.

After a second peek into Casablanca's code, we learn that it's leveraging Windows HTTP Server API, but does not configure any HTTPS support by itself. Then we learn from MSN docs, that this will be usually done in the configuration step, before the HTTP server is started. Apparently a programmatic solution is possible too, bu we start with the simple one (or so we think...)**.

In short, we have to attach a server certificate to a port on the machine where our server runs (i.e. our developer machine, at least in this post). For that, we first have to create a new certificate.

2. First try

For the sake of our tests we want to create a self-signed certificate. On Windows we can use the MakeCert.exe tool for that. So I opened the VisualStudio developer console and typed***:

  makecert.exe -sr LocalMachine -ss Root -a sha512 -n "CN=my_machine_name" -sky exchange -pe -r

as to create a self-signed (-r) cert for local machine (required for the add sslcert step later), export it's private key (-pe), use if for exchange (-sky) and store it straight away into the trusted Root Certification Authorities store. OK, it worked.

Next we open the MMC, check that the certificate is there and it has no errors. Don't know how? It's explained several times on the Web, for example here.

Now we need only to set the certificate for server's IP address. The command for that is :

  netsh http add sslcert ipport=0.0.0.0:8111 certhash=XXXX appid={yyyyy}

For the certhash we can use the thumbprint from certificate's Properties window in MMC, for the application ID the most frequent advice is to use the ID of the IIS Server, but as we don not even have it installed on our machine, we prefer to use an empty GUID: {00000000-0000-0000-0000-000000000000}, it's allowed too! Should be working OK!

Alas the dreaded Error 1312 showed it's ugly head:

  Certificate add failed, Error: 1312 A specified logon session does not exist. It may already have been terminated.

WTF? What is that supposed to mean? There are many reasons, like bad certificate hash (nope, double-checked it), a missing Windows hot-fix (nope, it's 2010, already installed), not exported private key (nope, we did it), some mysterious cases when certificates get renewed (nope, there are fresh here), etc, etc. But definitely not a problem with a logon session, that much is true!

3. The second try

What should we do? As always the advice is: go back to the last known working setup. As to do this, I followed advice in this blogpost quite meticulously, and created two certificates:

  makecert -sk testRootCA -sky signature -sr localmachine -n "CN=RootTrustedCA" -ss TRUST -r RootTrustedCA.cer

  makecert -sk testServer -ss MY -sky exchange -sr localmachine -n "CN=Server" -ic RootTrustedCA.cer -is TRUST Server.cer -pe


The first one is supposed to be the base and trusted one, the second to reference the first. That didn't cut it either, because the RootTrustedCA still wasn't trusted - I had to add it to the Root store using MMC (don't know how? - look here).

This time, I didn't want to type the certificate hash and used this nice tool to apply the certificate to the desired IP address, and guess what - it worked! The last problem was, that the domain and certificate's names didn't match, so I created a cert with my machine's name:

  makecert -sk -ss MY -sky exchange -sr localmachine -n "CN=my_machine_name" -ic RootTrustedCA.cer -is TRUST my_machine_name.cer -pe

 applied it to 0.0.0.0:8111 and, at last, everything worked like a charm:




4. Conclusion

As it seems, add sslcert (an by extension, Windows' HTTP Server API) only supports certificates which reference a base trusted one, and doesn't like the ones from the Root store. Remember: the certificate has to be in the User store and has to be in the computer account (i.e. for the local machine)!

Update: The above conclusion is in principle correct, but only when we the add two small words: "by default"! As explained in a friendly comment, add sslcert accepts a command line switch where the desired certificate store can be specified!

OK, that was a little tiring, we'll add support for the client side in another post.

PS: Later I had to add configuration of the SSL certificates to our server configuration code, lots of Windows  system programming :-/.

--
* I mean TSL, but for historical reasons....

** Of course in the real server we'd will implement programmatic solution for configuration of the SSL options for a given URL. You may look up the HTTP Server API's functions HttpSetServiceConfiguration and HTTPQueryServiceConfiguration

*** not quite, first I tried to create a private certificate referencing an already existing, trusted one with:

  makecert.exe -sr LocalMachine -ss MY -a sha512 -n "CN=my_machine_name" my_machine_name -sky exchange -pe -eka 1.3.6.1.5.5.7.3.1

here the implicit Root Agency certificate is used. The problem here is, that the created certificate was broken, MMC said: "This certificate has an invalid digital signature". After much ado it turned out, that the referenced  Root Agency certificate uses a public key which is too short (only 512 bits!!!) and is thus considered not secure. Thanks for a very informative error message!


Casablanca C++ REST Framework - One Year Later.


Approximately one year ago I started a new project for one of my customers with the aim of adding a REST interface to their redesigned product. The new release should transform their as yet desktop-only and Windows-only application into a cross-platform, distributed, client-server one, using an HTTP API for communication.

The technologies to be used were Qt, C++11, Windows and Apple's OS X. The question was how to implement the REST interface.

The Setup

After having investigating some choices*, I settled for Microsoft's C++ REST SDK code-named Casablanca. It was cross-platform (sic!), used modern C++ constructs (C++11, you've known that I'd like that!) and was open source (sic!). Sounds like it wasn't Microsoft, but I think, we still need some time to get used to the new Microsoft.

There were some problems, though. The client choose Qt 5 framework for portability, and initially I was worried if Casablanca would play well with Qt's "I am the world" attitude, it's message pump and threading model. Moreover, the server implementation resides in an "experimental" namespace, which normally isn't a good sign either!

On the positive's side there was JSON support and nice asynchronous file transfer implementation based on the PPLX tasks (on Windows, for Linux, OS X, etc. Microsoft wrote a port). This was a big one, as the main functionality of the server will be processing files, and the input files will be mostly uploaded from other machines. And of course the biggest one - it's open source!

So the endeavor was not without risk! What can I say after a year give or take a couple of months after we started?

Highlights

One of the highlights is of course the task-based implementation of asynchronous processing used in Casablanca's API, like here, in the already mentioned, built-in support for file transfers:
CCasablancaFileTransfer::Task CCasablancaFileTransfer::StartFileDownload(const QString& downloadUrl, const QString& localFilepath) const
{
  using concurrency::streams::istream;
  using concurrency::streams::streambuf;
  using concurrency::streams::file_buffer;

  web::http::uri url(downloadUrl.ToStdString());
  web::http::client::http_client client(url);

  web::http::http_request getRequest(web::http::methods::GET);
  getRequest.headers().add(web::http::header_names::accept, "application/octet-stream"); 

  return client.request(getRequest)
    .then([=](pplx::task<web::http::http_response> previousTask)
  {
    try
    {
      auto response = previousTask.get();

      if (response.status_code() != web::http::status_codes::OK)
      {
        QString errTxt = ".....";
        return pplx::task_from_result(std::make_pair(false, errTxt));
      }

      try
      { 
        streambuf<uint8_t> localFile = file_buffer<uint8_t>::open(localFilepath.ToStdString()).get();

        return response.body().read_to_end(localFile)
          .then([=](pplx::task<size_t> previousTask)
        {
          streambuf<uint8_t>& nonconstFile = const_cast<streambuf<uint8_t>&>(localFile);
          nonconstFile.close().get();

          // ETag?
          QString maybeEtag = ftutil::FindHeader(response, web::http::header_names::etag);

          return pplx::task_from_result(std::make_pair(true, maybeEtag));
        });
      }
      catch (...)
      {
        return TranslateFileException(localFilepath);
      }
    }
    catch (...)
    {
      return TranslateWebException();
    }   
  });
}
Please notice that each block following a .then()will be executed asynchronously, in a separately scheduled thread (or task), when the preceding step will finish! You can do the same on the client side of course. Alternatively you can force blocking processing of a task by calling its get() method.

If you like comparisons, you may have a look at Facebook's Futures in fbthrift, They are generally working like Casablanca's tasks, but ave an additional nice onError() clause and even the possibility to choose a specific executor!

Note: I won't give here an introduction to Casablanca, the basic usage was explained several times on the Web (look here and here for basic client usage, here for basic server example, and here for file transfers). However, what I found is missing in all intro material I've seen, is the mention of exception propagation between asynchronous tasks. The problem is that a thrown exception has to be "observed" by library user, and if it won't be observed, Casablanca will "fail fast", i.e. take down the server in the destructor of the task that trew. Surprise, surprise, your server is crashing! An exception is observed (i.e. marked as such and then rethrown) if the .get() or wait() methods are called for that task or one of its continuations. So be cautious! The above code thus needs additional try-catch clause around the final .get() call, but I omitted it for sake of simplicity...

So it didn't take long, and I could announce:

Problems

1. The only really big problem that hit us was the performance. The trouble was that after a couple of thousands of file uploads or downloads server performance tumbled into a free fall: the same basic polling which normally had taken 2-3% of the CPU time surged to 20-30% after that! That was a real show stopper at first. And of course it wasn't our code that showed up in the profiler, it was some of Casablanca internals!

It took me about 2 weeks to investigate that, and maybe I'll write a more detailed post about it some time, but for now it suffices to say that Windows Concurrency Runtime (i.e. the native, task-based implementation of PPL) was left with an ever-growing internal list, which was sequentially scanned with each tick of the scheduler - at least in our environment of Windows 7 and Visual Studio 2013 plus an early Casablanca version (1.2.0).

I reported it to Microsoft, and they reacted pretty quickly - the next release has been given a define (CPPREST_FORCE_PPLX) to disable Concurrency Runtime and switch over to Windows Threadpool based implementation of PPL. I got the latest version from the development branch, tested it, and voila! our performance problems vanished into thin air. We then waited for the next release (2.5.0) and when it came out, we upgraded our code to it and suddenly everything worked like a charm. BTW, for the Casablanca version for Visual Studio 2015 Windows Threadpool is the default setting (or so I was told)**.

2. Other problem was lack of built-in CORS support, I had to implement t myself. Wasn't that difficult, but it's bound to our mixed Casablanca/Qt environment, so unfortunately I couldn't contribute it back to the project.

3. Then there were Qt-specific problems, first of them Qt's notorious usage of the /Zc:wchart_t- flag on Windows. This means that Qt uses for wide character not a native type (as the standard would require) but a typedef to unsigned short. Thus you won't be able to link Casablanca to your Qt-compliant project, because std::string will be mangled to $basic_string@GU by Qt (and your code) but to $basic_string@_WU by standard Casablanca build.

The remedy is to build Casablanca on your own (i.e. not to use the NuGet packet) with /Zc:wchart_t-. This will first fail with an error complaining about double definition for wchar_t, but then uncommenting the redundant definition will suffice. Hmm, when I think of this, I should probably contribute this back to Casablanca with some #ifdef combination...

4. Another Qt-related problem I stumbled upon was nothing more than a new type of deadlock (at least for me). I'll dub it the "big signal-mutex confusion deadlock". It's quite interesting though: imagine that your handler does one half of the work in a Casablanca thread but the second one in the main Qt-thread. Then use some lock to protect a resource from parallel access.

Now imagine, you lock the mutex in a Casablanca part of the GET handler and emit a signal to continue processing in Qt context. Now another handler (e.g. DELETE) didn't need to lock in the Casablanca context, already emitted a signal and is now in the Qt context trying to lock the mutex. The problem is, the second part of the GET handler will never execute, as the message pump is blocked with waiting for the mutex, which can be unlocked only by the next signal to come - deadlock. Mind-boggling? Well, threads are hard. Remedy: always lock the resource in Casablanca part of the handler, even if you don't need it.

Admittedly, the problem looks somehow artificial, but that's a consequence of the somehow inane requirement, that the requests should be finally processed in the Qt-context, a requirement originating from another part of the system, which I won't discuss or disparage here.

5. A minor problem was timestamp comparison using Casablanca's utility::datetime class:
int CBasicLastModifiedMap::CompareSec(const utility::datetime& lhs, const utility::datetime& rhs)
{
#if _DEBUG
 // TEST:::
  auto lhsStrg = lhs.to_string();
  auto rhsStrg = rhs.to_string();
#endif

  int timestampDiffSec = lhs - rhs; // truncates up to seconds!

  if (timestampDiffSec == 0)
    return 0;
  else 
    //extra check, because timestampDiffSec always > 0 (Casablanca problem!)
    if (lhs.to_interval() > rhs.to_interval()) 
      return 1;
  else
    return -1;
}
Bug or feature? - decide by yourself.

6. One (minor?) problem we encoundered was specifying all interfaces for the server to bind on. Normally you'd expect to be able to use "0.0.0.0", but in Casablanca rejects it. After consulting the source code the solution was clear: just use "*" instead!***

Conclusion

Otherwise: we are quite happy!!! The system works cross-platform, the performance is good, no apparent problems there.

Well, no problems till now. My client didn't specify any security for the first version of the product, they assumed (quite reasonably) that the customers will use the new product only inside of their trusted network at first. Neither is any notion of client identification or client roles planned, nor are we using gzipping of the HTTP data. Thus the more advanced features of a HTTP server weren't required.

As the next step towards more complicated scenarios, we'll look at HTTPS support in Casablanca. See you than...

Update:

OK, we actually found one weird bug while testing: "http_listener crashes when URI contains a pending square bracket" (bug report + proposed fix here). I had to patch it locally for the time being, but as it seems, it'll be fixed in version 2.6. Weird, other misconstructed URLs are rejected OK, only square brackets generate crashes. 
Update 2: same problem with square brackets in HTTP parameters, only I didn't resolved it yet (no time, a low prio bug). So story continues.
Update 3: The above problem resolved, see my code comments for explanation:
  // As of Casablanca 2.5 http_request::relative_uri() will throw exception if it encounters (correctly) encoded
  // "[", "]" or "#" characters in the query parameters!
  //  - workaround: try to extract the relative path by hand (message.absolute_uri().path() works)

Another 
Update: (07 Sep. 2016)

I said, we were overall happy with Casablanca, but there's a new problem, which probably could be pretty grave. Namely, Casablanca server tends to crash sometimes. In normal operation it's very seldom, but if there is a severe overload, it may happen rather more often :(. At first I thought another parts of code would have some dangling pointers, and put this error into a waiting room.

Recently I took some of my time to analyze it, and it seems to be a genuine Casablanca problem, which is somehow connected to handling of timed-out client connections (as it seems in the moment). In Release 2.8.0 there was a pull request that refactored this part of the code to remove some race conditions, but I'm not sure if this was sufficient to fix the crash...

As soon as I solved that, I'll blog. For the moment take heed, the server part is still in the "experimental" namespace (as of 2.8.0) and:
"The http_listener code is the least tested code in the library despite its age, so there are certainly a lot of issues here that need fixing."
--
* like: QtWebApp (by Stefan Frings), Qxt's WebModule, Tufao, QHttpServer, Pillow, nanogear, Apache's Axis2/C, Casablanca (of course!), cpp-netlib, gSoap WebServer, microhttpd (by GNU), libhttpsrver (based on microhttpd), Mongoose, libevhttp. They were (for the most part) fine, sometimes even great, but virtually none of them had C++11 or async file transfer support!

Update: Facebook's Wangle could be a good candidate too, it seems to have a decent async. support (see here), but.... : at that time I didn't find it (definitely a show-stopper ;). At the time of this writing it's got problems compiling on OS X, and I'm not sure if it compiles on Windows at all. There's simply no statement on the project's page what platforms are supported. AFAIK Thrift (or was it Folly?) seems to compile on Windows, but this compiling mess could be a problem.

** as it seems, Concurrency Runtime isn't used anymore in Microsoft's STL implementation (at least in OS's newer than XP) - http://blogs.msdn.com/b/vcblog/archive/2015/07/14/stl-fixes-in-vs-2015-part-2.aspx:
"Using ConcRT was a good idea at the time (2012), but it proved to be more trouble than it was worth.  Now we're using the Windows API directly, which has fixed many bugs."
and:
"... std::mutex on top of ConcRT was so slow!"  
*** I was recently asked by a reader of this post in an email about that problem, and I realized I forgot to mention that in the 1st writeup. Sorry!