Approximately one year ago I started a new project for one of my customers with the aim of adding a REST interface to their redesigned product. The new release should transform their as yet desktop-only and Windows-only application into a cross-platform, distributed, client-server one, using an HTTP API for communication.
The technologies to be used were Qt, C++11, Windows and Apple's OS X. The question was how to implement the REST interface.
The Setup
After having investigating some choices*, I settled for Microsoft's
C++ REST SDK code-named
Casablanca. It was cross-platform (sic!), used
modern C++ constructs (C++11, you've known that I'd like that!) and was open source (sic!). Sounds like it wasn't Microsoft, but I think, we still need some time to get used to the
new Microsoft.
There were some problems, though. The client choose Qt 5 framework for portability, and initially I was worried if Casablanca would play well with Qt's "I am the world" attitude, it's message pump and threading model. Moreover, the server implementation resides in an
"experimental" namespace, which normally isn't a good sign either!
On the positive's side there was JSON support and nice asynchronous file transfer implementation based on the PPLX tasks (on Windows, for Linux, OS X, etc. Microsoft wrote a port). This was a big one, as the main functionality of the server will be processing files, and the input files will be mostly uploaded from other machines. And of course the biggest one - it's open source!
So the endeavor was not without risk! What can I say after a year give or take a couple of months after we started?
Highlights
One of the highlights is of course the task-based implementation of asynchronous processing used in Casablanca's API, like here, in the already mentioned, built-in support for file transfers:
CCasablancaFileTransfer::Task CCasablancaFileTransfer::StartFileDownload(const QString& downloadUrl, const QString& localFilepath) const
{
using concurrency::streams::istream;
using concurrency::streams::streambuf;
using concurrency::streams::file_buffer;
web::http::uri url(downloadUrl.ToStdString());
web::http::client::http_client client(url);
web::http::http_request getRequest(web::http::methods::GET);
getRequest.headers().add(web::http::header_names::accept, "application/octet-stream");
return client.request(getRequest)
.then([=](pplx::task<web::http::http_response> previousTask)
{
try
{
auto response = previousTask.get();
if (response.status_code() != web::http::status_codes::OK)
{
QString errTxt = ".....";
return pplx::task_from_result(std::make_pair(false, errTxt));
}
try
{
streambuf<uint8_t> localFile = file_buffer<uint8_t>::open(localFilepath.ToStdString()).get();
return response.body().read_to_end(localFile)
.then([=](pplx::task<size_t> previousTask)
{
streambuf<uint8_t>& nonconstFile = const_cast<streambuf<uint8_t>&>(localFile);
nonconstFile.close().get();
// ETag?
QString maybeEtag = ftutil::FindHeader(response, web::http::header_names::etag);
return pplx::task_from_result(std::make_pair(true, maybeEtag));
});
}
catch (...)
{
return TranslateFileException(localFilepath);
}
}
catch (...)
{
return TranslateWebException();
}
});
}
Please notice that each block following a
.then()will be executed asynchronously, in a separately scheduled thread (or task), when the preceding step will finish! You can do the same on the client side of course. Alternatively you can force blocking processing of a task by calling its
get() method.
If you like comparisons, you may have a look at Facebook's
Futures in
fbthrift, They are generally working like Casablanca's tasks, but ave an additional nice
onError() clause and even the possibility to choose a specific executor!
Note: I won't give here an introduction to Casablanca, the basic usage was explained several times on the Web (look
here and
here for basic client usage,
here for basic server example, and
here for file transfers). However,
what I found is missing in all intro material I've seen, is the mention of exception propagation between asynchronous tasks. The problem is that a thrown exception has to be "observed" by library user, and if it won't be observed, Casablanca will "fail fast", i.e. take down the server in the destructor of the task that trew.
Surprise, surprise, your server is crashing! An exception is observed (i.e. marked as such and then rethrown) if the .get() or wait() methods are called for that task or one of its continuations. So be cautious! The above code thus needs additional try-catch clause around the final
.get() call, but I omitted it for sake of simplicity...
So it didn't take long, and I could announce:
Problems
1. The only really big problem that hit us was
the performance. The trouble was that after a couple of thousands of file uploads or downloads server performance tumbled into a free fall: the same basic polling which normally had taken 2-3% of the CPU time surged to 20-30% after that! That was a real show stopper at first. And of course it wasn't our code that showed up in the profiler, it was some of Casablanca internals!
It took me about 2 weeks to investigate that, and maybe I'll write a more detailed post about it some time, but for now it suffices to say that Windows
Concurrency Runtime (i.e. the native, task-based implementation of PPL) was left with an ever-growing internal list, which was sequentially scanned with each tick of the scheduler - at least in our environment of Windows 7 and Visual Studio 2013 plus an early Casablanca version (1.2.0).
I reported it to Microsoft, and they reacted pretty quickly - the next release has been given a define (
CPPREST_FORCE_PPLX) to disable Concurrency Runtime and switch over to Windows Threadpool based implementation of PPL. I got the latest version from the development branch, tested it, and voila! our performance problems vanished into thin air. We then waited for the next release (2.5.0) and when it came out, we upgraded our code to it and suddenly everything worked like a charm. BTW, for the Casablanca version for Visual Studio 2015 Windows Threadpool is the default setting (or so I was told)**.
2. Other problem was
lack of built-in CORS support, I had to implement t myself. Wasn't that difficult, but it's bound to our mixed Casablanca/Qt environment, so unfortunately I couldn't contribute it back to the project.
3. Then there were
Qt-specific problems, first of them Qt's notorious usage of the
/Zc:wchart_t- flag on Windows. This means that Qt uses for wide character not a native type (as the standard would require) but a
typedef to
unsigned short. Thus you won't be able to link Casablanca to your Qt-compliant project, because
std::string will be mangled to
$basic_string@GU by Qt (and your code) but to
$basic_string@_WU by standard Casablanca build.
The remedy is to build Casablanca on your own (i.e. not to use the NuGet packet) with
/Zc:wchart_t-. This will first fail with an error complaining about double definition for
wchar_t, but then uncommenting the redundant definition will suffice. Hmm, when I think of this, I should probably contribute this back to Casablanca with some
#ifdef combination...
4. Another Qt-related problem I stumbled upon was nothing more than
a new type of deadlock (at least for me). I'll dub it the
"big signal-mutex confusion deadlock". It's quite interesting though: imagine that your handler does one half of the work in a Casablanca thread but the second one in the main Qt-thread. Then use some lock to protect a resource from parallel access.
Now imagine, you lock the mutex in a Casablanca part of the GET handler and emit a signal to continue processing in Qt context. Now another handler (e.g. DELETE) didn't need to lock in the Casablanca context, already emitted a signal and is now in the Qt context trying to lock the mutex. The problem is, the second part of the GET handler will never execute, as the message pump is blocked with waiting for the mutex, which can be unlocked only by the next signal to come - deadlock. Mind-boggling? Well, threads are hard. Remedy: always lock the resource in Casablanca part of the handler, even if you don't need it.
Admittedly, the problem looks somehow artificial, but that's a consequence of the somehow inane requirement, that the requests should be finally processed in the Qt-context, a requirement originating from another part of the system, which I won't discuss or disparage here.
5. A minor problem was
timestamp comparison using Casablanca's
utility::datetime class:
int CBasicLastModifiedMap::CompareSec(const utility::datetime& lhs, const utility::datetime& rhs)
{
#if _DEBUG
// TEST:::
auto lhsStrg = lhs.to_string();
auto rhsStrg = rhs.to_string();
#endif
int timestampDiffSec = lhs - rhs; // truncates up to seconds!
if (timestampDiffSec == 0)
return 0;
else
//extra check, because timestampDiffSec always > 0 (Casablanca problem!)
if (lhs.to_interval() > rhs.to_interval())
return 1;
else
return -1;
}
Bug or feature? - decide by yourself.
6. One (minor?) problem we encoundered was specifying all interfaces for the server to bind on. Normally you'd expect to be able to use
"0.0.0.0", but in Casablanca rejects it. After consulting the source code the solution was clear: just use
"*" instead!***
Conclusion
Otherwise: we are quite happy!!! The system works cross-platform, the performance is good, no apparent problems there.
Well, no problems till now. My client didn't specify any security for the first version of the product, they assumed (quite reasonably) that the customers will use the new product only inside of their trusted network at first. Neither is any notion of client identification or client roles planned, nor are we using gzipping of the HTTP data. Thus the more advanced features of a HTTP server weren't required.
As the next step towards more complicated scenarios, we'll look at HTTPS support in Casablanca. See you than...
Update:
OK, we actually found one weird bug while testing:
"http_listener crashes when URI contains a pending square bracket" (bug report + proposed fix
here). I had to patch it locally for the time being, but as it seems, it'll be fixed in version 2.6. Weird, other misconstructed URLs are rejected OK, only square brackets generate crashes.
Update 2: same problem with square brackets in HTTP parameters, only I didn't resolved it yet (no time, a low prio bug). So story continues.
Update 3: The above problem resolved, see my code comments for explanation:
// As of Casablanca 2.5 http_request::relative_uri() will throw exception if it encounters (correctly) encoded
// "[", "]" or "#" characters in the query parameters!
// - workaround: try to extract the relative path by hand (message.absolute_uri().path() works)
Another Update: (07 Sep. 2016)
I said, we were overall happy with Casablanca, but there's a new problem, which probably could be pretty grave. Namely,
Casablanca server tends to crash sometimes. In normal operation it's very seldom, but if there is a severe overload, it may happen rather more often :(. At first I thought another parts of code would have some dangling pointers, and put this error into a waiting room.
Recently I took some of my time to analyze it, and it seems to be a genuine Casablanca problem, which is somehow connected to handling of timed-out client connections (as it seems in the moment). In Release 2.8.0 there was a
pull request that refactored this part of the code to remove some race conditions, but I'm not sure if this was sufficient to fix the crash...
As soon as I solved that, I'll blog. For the moment take heed, the server part is still in the "experimental" namespace (as of 2.8.0)
and:
"The http_listener
code is the least tested code in the library despite its age, so there are certainly a lot of issues here that need fixing."
--
* like:
QtWebApp (by Stefan Frings),
Qxt's
WebModule,
Tufao,
QHttpServer,
Pillow,
nanogear, Apache's
Axis2/C,
Casablanca (of course!),
cpp-netlib,
gSoap WebServer,
microhttpd (by GNU),
libhttpsrver (based on
microhttpd),
Mongoose,
libevhttp. They were (for the most part) fine, sometimes even great, but virtually none of them had C++11 or async file transfer support!
Update: Facebook's
Wangle could be a good candidate too, it seems to have a decent async. support (see
here), but.... : at that time I didn't find it (definitely a show-stopper ;). At the time of this writing it's got problems compiling on OS X, and I'm not sure if it compiles on Windows at all. There's simply no statement on the project's page what platforms are supported. AFAIK
Thrift (or was it
Folly?) seems to compile on Windows, but this compiling mess could be a problem.
** as it seems, Concurrency Runtime isn't used anymore in Microsoft's STL implementation (at least in OS's newer than XP) -
http://blogs.msdn.com/b/vcblog/archive/2015/07/14/stl-fixes-in-vs-2015-part-2.aspx:
"Using ConcRT was a good idea at the time (2012), but it proved to be more trouble than it was worth. Now we're using the Windows API directly, which has fixed many bugs."
and:
"... std::mutex on top of ConcRT was so slow!"
*** I was recently asked by a reader of this post in an email about that problem, and I realized I forgot to mention that in the 1st writeup. Sorry!