Wednesday, 27 July 2016

Structure-like std::tuple usage

While reading P0095R0, WG21* I stumbled upon a cool trick. If you define following structures:
  struct x { double value; };
  struct y { double value; };
  struct z { double value; };

  using point = std::tuple<x, y, z>
you will then be able to write:
  auto x = std::get<x>(point);
  auto y = std::get<y>(point);  
  auto z = std::get<z>(point); 
Look at that! Now we can use std::get<x> to fetch the 'x' value of the tuple, std::get<y> for 'y' and so on, instead of plain, old (and ugly...):
  using point = std::tuple<double, double, double>

  auto x = std::get<0>(point);
  auto y = std::get<1>(point);  
  auto z = std::get<2>(point);
Cool, isn't it?

But the author dismisses such tricks on the spot:
"Should we use this approach everywhere and deprecate the use of struct in any context? In the author's opinion we should not. The use of wrapper types is much more complicated to both read and understand than a plain struct. For example, the wrapper types that were introduced, such as the 'x' type, make little sense outside of their corresponding tuples, yet they are peers to it in scope. Also, the heavy syntax makes it difficult to understand exactly what is intended by this code."
...on the premise that it is too clever. Come on, that little bit of extra code in one place, and then this readability all over your source code? But wait, it comes even harder:
"While the utility of type selection and SFINE for visitors is quite clear for advanced C++ developers, it presents significant hurdles for the beginning or even intermediate developer. This is especially true when it is considered that the visit function is the only way to guarantee a compilation error when all cases are not considered."
That got me thinking. First I wanted to dismiss the argument along the lines of: "what, how can a C++ developer not know what SFINAE is...", but then another thought grew stronger and stronger: "come down from the expert-only, high-church, pure-blood stance, C++ should be usable for beginners too!". Just think about yourself trying to learn an new programming language for a minute. Do you want it esoteric and complicated? Not really. So this is really a show stopper, kind of...

Because, we don't want to give up our freshly discovered syntactic candy so easily, do we?

Thus, I think we need a new syntax extension for C++17 (or C++0y?)**:
  using point = std::tuple<x : double, y : double, z : double>

  auto x = std::get<x>(point);
  auto y = std::get<y>(point);  
  auto z = std::get<z>(point);
Nowadays everybody gets his/hers pet feature added to C++17 (just look at if and switch initializers), so why not?

Come on guys! Who's with me? 

* "The Case for a Language Based Variant", C++ Standard proposal P0095R0. It discusses advantages of language based variant implementation vs. a library-only solution.

** of course we could use the brand-new destructuring like in:
  using point = std::tuple<double, double, double>

  auto [x, y, z] = point;
but that's not what we want. We need named access to single elements of the tuple. And don't say we shouldn't use tuple, because we simply want to!

Friday, 15 July 2016

A fairytale found on Stack Overflow

Somewhere on the Internets there is that little fairy tale about a little princess*, written in the
language of dreams, i.e. Javascript:

  function princess() {

      var adventures = [];

      function princeCharming() { /* ... */ }
      var unicorn = { /* ... */ },
          dragons = [ /* ... */ ],
          squirrel = "Hello!";

      return {
          story: function() {
              return adventures[adventures.length - 1];

  var littleGirl = princess();

Can you read it? If not, there is this attempt of translation by Patrick M:
"...the princess() function is a complex scope containing private data. Outside the function, the private data can't be seen or accessed. The princess keeps the unicorns, dragons, adventures etc. in her imagination (private data) and the grown-ups can't see them for themselves. BUT the princess's imagination is captured in the closure for the story() function, which is the only interface the littleGirl instance exposes into the world of magic. – Patrick M Feb 28 '13 at 7:49"  
Still an alien language? Learn some Javascript closures, would you? And then a story unveils:
Domenichino [Public domain], via Wikimedia Commons
Once upon a time, there was a princess, her world was full of adventures: dragons, unicorns, talking squirrels (Narnia?). But because in this world she was only a little girl, she only could tell stories about her adventures. And nobody would believe her, because she couldn't take anybody into her magical land (due to a curse of an old, ugly witch called Ecma). So the princess was both happy and sad, because she couldn't share her adventures with anyone. An that's the tragic of sorcery, you have always to pay a price for it...
Now that you are in the know, did you notice, that the princess doesn't tell anyone about Prince Charming?

* source:

Sunday, 5 June 2016

Converting shared_ptr of Derived to shared_ptr of Base

Years and years ago, as C++ templates were gaining prominence, the common problem users had with them was to understand how the types of particular template instantiations were related. E.g. given the following classes:
  class Base {};
  class Derived : public Base {};

  SomeTempl<Base> tb;
  SomeTempl<Derived> td;
it was somehow non-intuitive that the class of the td object was not a subcalsss of tb. But well, that wasn't that different form the basic language rules for array types:
  Base[100] ab;
  Derived[100] ad;
Here ab isn't in any way related to ad. So some user wished that templates may be somehow related on their parameters but it was an easy task to persuade them, that they were plain wrong. And taking into account that in those distant times templates were used as implementation helper to define containers, this wasn't even that wrong.

Well, until...

You guessed it, until C++11 arrived, and the "C++ renaissance" and "modern C++" was announced. As you might know, in "modern" C++ you shouldn't use naked pointers, instead, you use the appropriate smart pointer class unique_ptr<>, shared_ptr<> or weak_ptr<>.

Ooops, now we don't have a container on our hands! It's rather a kind of decorator, taking a type, and creating a new, but semantically not really different one. What I mean here is, that in this case, we have a warranted need for template types to be related based on their arguments! I.e.:
  unique_ptr<Base> pb;
  unique_ptr<Derived> pd;
pd should behave as subclass of pb, because the unique_ptr<> "decorator" template's intention is to preserve the argument type's pointer semantics! So what now? Language rules for templates cannot be changed, the types cannot be related, so if you want to write covariant member function using pointer types (like here:)
  class X
  public: Base* calculemus();

  class Y : public X
  public: Devived* calculemus();
you cannot use smart pointers instead. That brought us a slew of Stack Overflow questions of general type "How to convert shared_ptr<> to...".

So indeed, what can be done? How to convert one to another? As I said before, template rules cannot be changed at that point. So if not in language definition, so maybe we can solve that in library?

Well, let's try. If we take, for example, a close look at the definition of unique_ptr<T> in the C++11 standard (through this time) we can find the following converting constructor:
template< class U, class E >
unique_ptr( unique_ptr<U, E>&& u );    (6)
6) Constructs a unique_ptr by transferring ownership from u to *this
This constructor only participates in overload resolution if all of the following is true:
  a) unique_ptr<U, E>::pointer is implicitly convertible to pointer
  b) U is not an array type
  c) Either Deleter is a reference type and E is the same type as D, or Deleter is not a reference type and E is implicitly convertible to D
We see, the converting constructor has to be enabled by SFINAE when the the pointer conversion is sound (probably by std::is_derived or something like that). Nice!

So the right way of converting a unique_ptr<Derived> to unique_ptr<Base> is to assign the later to the former! I knew, the C++ committee wouldn't let us in the lurch!

PS: And if you want to know how to use that possibility to somehow emulate covariant return types through smart pointers, have a look at Arne Mertz's blogpost:

Tuesday, 29 March 2016

HTTPS Support for a Casablanca Client

This post is a continuation of a previous one about SSL and Casablanca C++ REST-API library by Microsoft. There we added SSL support to a Casablanca server. This time we'll do it on the client side. At the first sight it seems to be simple, just write:
  web::http::client::http_client restClient(U("https://xxx.yyy.zzz"));
  auto response = restClient.request(web::http::methods::GET);
and go on! As we saw, the only thing to do is to use a HTTPS URI. Nothing more to do? Not really! The main problem when adding SSL support to the client occurred to be the error handling. More specifically, I wanted to differentiate between an SSL error and a simple no-connection error.

Casablanca uses the new standard std::error_condition/error_category classes for that. The problem is, these classes are yet poorly understood by programmers (at least by me, because it is so new)*, and, which is much worse, that their usage by Casablanca (as of v.2.5.0) isn't consistent between Windows and Linux code!

On Windows, there is a special windows_category defined for system errors, while for Linux the standard std::system_category is used. Worse, the error codes forwarded there come from Boost ASIO, which in its turn just forwards OpenSSL's error codes. Which again depends on the library version :-/.

As to tame this chaos a little, the following little helper class was born:
class CCasablancaClientErrorCode 
  CCasablancaClientErrorCode(int errCode) : _errCode(errCode) {};
    Check and translate to POSIX-conforming system error code.

    For HTTP connection problems following codes are used by Casablanca:
        std::errc::host_unreachable, std::errc::timed_out, std::errc::connection_aborted
  bool IsStdSystemError(std::errc& stdErrorCode) const
#ifdef _WINDOWS
    const std::error_condition ec = utility::details::windows_category().default_error_condition(_errCode);
    const std::error_condition ec = std::system_category().default_error_condition(_errCode);
    if (ec.category().name() != genericCategoryStrg) 
      return false;
      stdErrorCode = std::errc(ec.value());
      return true;
    Check if the reported error comes from SSL.

    There is only one, generic server SLL certificate error at the moment!
  bool IsSslError() const
#ifdef _WINDOWS
    const std::error_condition ec = utility::details::windows_category().default_error_condition(_errCode);

    if (ec.category() == utility::details::windows_category())
      return _errCode == ERROR_WINHTTP_SECURE_FAILURE;
      return false;
    const std::error_condition ec = std::system_category().default_error_condition(_errCode);

    if (ec.category().name() == genericCategoryStrg)
      return false;
      // ??? OPEN TODO:::: must test, but it will depend on the SSL version !!!
      // - patch Casablanca's ASIO code?
      // return _errCode == 336458004; //== 0x140DF114  // OpenSSL error code  ??OR?? 335544539 == 0x140000DB

      return true; // should be SSL 

  int _errCode;
Note that on Windows, only a single generic SSL error code is supported by Casablanca 2.5. If we wanted to differentiate between SSL error types, we'd have to patch Casablanca code (install a special callback handler to report the context of the HTTP error). I didn't make it, as generic SSL error was good enough for my customer.

Now now we can use this helper class like that:
  CCasablancaClientErrorCode clientErrCode = e.error_code().value();
  std::errc sysErrCode;

  if (clientErrCode.IsStdSystemError(sysErrCode))
    switch (sysErrCode)
    case std::errc::host_unreachable:
      // no connection error!
      // other error (std::errc::timed_out, std::errc::connection_aborted)
  else if (clientErrCode.IsSslError())
    // SSL error!
Having done that, the HTTPS support on the Client side was ready.

* I won't explain the workings of std::error_category here, for the confused readers this was already done here (parts 1 to 5, I said it's not a piece of cake).

Empirical Results on the Static vs. Dynamic Debate?

You may know that I'm quite interested in the "Static vs. Dynamic Typing" controversy, and already even clearly declared my allegiance basing my decision on some pretty sloppy reasoning (if not largely on the gut feeling...).

But what about some hard facts, numbers, measurements and controlled experiments to decide the debate? Like the Leibnizian* "Calculemus!". Well, some good soul (@danluu) took the burden of this task and looked** at the best known papers, talks  and comparison studies. To what avail?

First there's a warning:
"If you look at the internet commentary on these studies, most of them are passed around to justify one viewpoint or another. The Prechelt study on dynamic vs. static, along with the follow-ups on Lisp are perennial favorites of dynamic language advocates, and github mining study has recently become trendy among functional programmers."
So apparently, there's little interest to find out the truth, only to defend convenient positions. But we are fearless seekers for the truth, and want to know what the results are. Unfortunately the classic studies, when examined in more detail reveal some serious methodological flaws, and the author could only conclude that:
"Other than cherry picking studies to confirm a long-held position, the most common response I’ve heard to these sorts of studies is that the effect isn’t quantifiable by a controlled experiment."
OK, pretty sobering, we still know nothing, it's all lies, damn lies and/or lacking statistical knowledge. But how come the theoretically superior static typing, and that by a simple argument - how can be more support from the compiler not be better than less support - do not bring any measurable improvements?

Lets reiterate the position taken by supporters of dynamic typing in words taken from a  talk at JavaZone:
If these academic computer scientists would get out more, they would soon discover an increasing incidence of software developed in languages such a Python, Ruby and Clojure which use dynamic, albeit strong, type systems. They would probably be surprised to find that much of this software—in spite of their well-founded type-theoretic hubris—actually works, and is indeed reliable out of all proportion to their expectations.
So it seems to work in real life, and the experimental studies cannot prove the contrary. But it also looks like someone is trying hard to ignore something, don't you think so?

But then I discovered a recent paper which proposed an new angle of attack!

While the old studies tried to measure the performance of programmers (and somehow failed to design a meaningful experiment) the authors of the new one pursued a more modest goal. They introduced random changes to program's code and then tried to compile and run it. In their words:
"Our study is based on a diverse corpus of programs written in several programming languages systematically perturbed using a mutation-based fuzz generator.
More importantly, our study also demonstrates the potential of comparative language fuzz testing for evaluating programming language designs."
We thus could have a possibility to compare programming languages in some way? Sounds promising! So what are the results?
"The results we obtained prove that languages with weak type systems are significantly likelier than languages that enforce strong typing to let fuzzed programs compile and run, and, in the end, produce erroneous results."
The exact results are maybe interesting***:

   - compile: 54%
   - run       :  31%
   - compile: 43%
   - run       :  25% 
: (that poster child of bad language design!)
   - compile: 48%
   - run       :  41% 

   - compile: 49%
   - run       :  23% 

   - compile: 18%
   - run       :  15% 

   - compile: 19,5%
   - run       :  18% 

   - compile: 20,5%
   - run       :  12%

See, On the average, dynamically typed programs are twice as likely to be compiling/running wioth nonsensical source code. That's definitely SOME improvement, isn't it?

On the other side: they are testing programming languages' resiliency against typos - but wasn't that the exact reason for introduction of types in programming languages? So we know that typed languages are good in what they were invented for. But we knew that already.

Update: This blogpost sets up a different thesis: the difference between statically and dynamically typed languages isn't that important. Rather than that, the simplicity of the language design is decisive in order to avoid bugs! IMHO this argument is misguided, because what I wanted to know is, given two languages of "same complexity" (problems, problems, hot to measure that?) which is a better one - the one with or without static typing?

*  Nice little fact - Leibniz's program for mathematization of human knowledge was formulated in the following words:
"Actually, when controversies arise, the necessity of disputation between two philosophers would not be bigger than that between two computists. It would be enough for them to take the quills in their hands, to sit down at their abaci, and to say  (as if inviting each other in a friendly manner): Let's calculate! (Calculemus!)"
Unfortunately, this cannot be done (at least according to Turing) and flame wars still prevail.

 ** - "Static vs. Dynamic Languages: A Literature Review"

*** Caution: they are not exact, I read it out of a diagram!

Friday, 15 January 2016

Enum's forward declarations

C++ Katie (@katie_cpp :) wrote a blogpost* about the new C++11 enums. First I was reluctant to read it, but then I risked a quick look, and indeed, I learned something new.

As in the last couple of years I have been using Visual Studio most of the time, it came natural to me that I can forward declare enums:
  enum MsgType;  // we are in Visual Studio!

  class C
    bool process(MsgType msg);
but it seems to be only a nonstandard language extension, as normally it's shouldn't be possible:
"It’s because the underlying type of enum is not known. The compiler needs to know how many bytes will be required to store the enum and it cannot be decided based on the forward declaration like above. It’s necessary to see the biggest value stored inside this enum."
Yessss, I can dimly recollect that in the past I couldn't do that (gcc), but then I tried once again, and Visual Studio let me get away with it. Thus VS-compiler seems to default enum's size in such a case to int. With C++11's scoped enums, this behaviour is standardized:
"Let’s try the same with enum class. Now it works! ... The underlying type of enum class is implicitly known by the compiler (it’s int by default)."
As a collorary to the compiler's conundrum with plain enums, there's a catch I wasn't aware of (or didn't think of it at first, after some hours of debugging I'd get it probably):
"Another consequence of unknown underlying type appears when the C++98 enum should be a struct member. The struct size can be different when the code is compiled on various machines!"
That can be a source of some nasty, hard to find bugs, believe me!

And there's a second thing I didn't know, namely that it's possible to specify the undelrlying type in a forward declaration for plain enums (just like in the new "scoped" enums):
  enum MsgType : int; 

  class C
    bool process(MsgType msg);
Did you know that? I didn't.

At first I thought it would be a C++98 feature already** and was a little bit ashamed of myself - all that years doing C++98 and you didn't know that. Ooops! But then I looked it up and saw that it is a new C++11 feature. So, I'm still learning C++11 apparently.


All in all a nice post*, <praise> not simply reiterating some new syntax/feature, like so many other bloggers, but also giving some juicy technical bits away! </praise>

Recently, I saw this infornation on SO: "Microsoft's compiler has allowed specifying the underlying type of an enum as a proprietary extension since VS 2005."

*, all citations are from that post.

** that could be a single critique point for the above blogpost... but a small one!

Sunday, 29 November 2015

OCaml and Multithreading

As Doctor House said one day: "There was that philosopher Ocaml..." Stop, wrong start! Try again.

1. OCaml story

OK, some time ago, in a somehow better world, I became attracted to the OCaml language because of its speed, elegance, light syntax, algebraic data structures and automatic type inference (as you meanwhile might know, I'm a bit of a typing fan). I must admit, it was my first functional language, so maybe I'm a little sentimental about it. Later on I somehow didn't give it much love and affection, mostly because I got distracted by other, more fashionable languages like Clojure, F# and Haskell (shame on me!).

Nonetheless, although this one is rather an obscure language, one company has been using it big time to do some highly parallel, high-performance data processing. This company is Jane Street and they are a financial market player. The reason they chose OCaml were that the owners wanted to read every line of code performing financial transactions*:
"Early on, a couple of the most senior traders (including one of the founders) committed to reading every line of code that went into the core trading systems, before those systems went into production"
"In 2003, Jane Street began a rewrite of its core trading systems in Java. The rewrite was eventually abandoned, in part because the resulting code was too difficult to read and reason about—far more difficult, indeed, than the VBA that was being replaced."
 "The VBA code was written in a terse, straight-ahead style that was fairly easy to follow. But somehow when coding in Java we built up a nest of classes that left people scratching their heads when they wanted to understand..."
 "In 2005, emboldened by the success of the research group, Jane Street initiated another rewrite of its core trading systems, this time in OCaml."
So somehow the functional-style code was more readable than a pile of Java classes: can we put that to the superiority of functional languages in general or just to bad training of Java programmers? An open question.

OK, there are additional technical reasons as well: the type inference that makes the code concise (thus more readable) while retaining type safety. Additionally it's runtime performance is pretty good! A double win against for example Python.

2. Multithreading?

For a long time they remained the single serious industry user of which I knew, but recently also Facebook started using OCaml in their Hack and Flow language typing add-ons for PHP and Javascript implementations. Surprise! And a very nice one, my pet language got appreciated at last! Everything OK?

Not quite: when I listened to that presentation, I was surprised to learn that OCaml's mutlithreading support is pretty much completely broken. How can that be? Why didn't I notice that first place?

Well, after some more digging, I received this link from a fellow programmer. It makes it pretty clear to us all that OCaml designers were quite a nasty bunch of "multicore deniers":
"The goals of OCaml threads are (2) and (3) but not (1)"
Where (1) denotes "Parallelism on shared-memory multiprocessors"! And then:
"Shared-memory multiprocessors have never really "taken off", at least in the general public.  For large parallel computations, clusters (distributed-memory systems) are the norm. For desktop use, monoprocessors are plenty fast."
 "What about hyperthreading?  Well, I believe it's the last convulsive movement of SMP's corpse :-)"
Yes, they didn't like it, to say the least. But there's another reason as well: the GC implementation in OCaml is single-threaded and thus it could be made very fast, which was an important factor in OCaml's initial successes. But then this (ironically) proved detrimental to trying to make the language MT-safe**.

The only way to do parallel processing in OCaml is thus multiprocessing + message passing. As I learned Jane Street wrote the Parallel library to support that:
"Parallel is a library for spawning processes on a cluster of machines, and passing typed messages between them. The aim is to make using another processes as easy as possible. Parallel was built to take advantage of multicore computers in OCaml, which can't use threads for parallelism due to it's non reentrant runtime
The Facebook people went one step further: they
"...use the same model for multithreading: a specially mmap'd region shared between different fork'd processes, containing a shared, lockless hash table. "
Thus they have multithreading without threads - a couple of processes sharing the memory space (or a part of it), faking the SMP model! You must admit it's rather brilliant - you spare yourself all that serialization, deserialization, and message passing that normally shave off a significant part of the performance.

3. So why OCaml?

Apparently, despite the mutithreading disaster, companies find OCaml worth trying out. Why? I'd say the reasons are:

  1. It supports functional style (but it allows dirty tricks when they must be done)
  2. Has strong typing unlike Lisp, Python, etc.
  3. Not bound to Windows and .NET like F#
  4. Not Haskell, you'll be able to read and understand it.***

And finally, maybe this whole multithreading business isn't worth the hassle? Facebook clearly need it, but Jane Street don't. I remember having heard Yaron Minsky (of Jane Street) saying somewhere that parallel processing plus message passing is a much safer model that multithreading. And in finance you need that safety.

I can accept this argument in a more general setting as well, because the only reason we have threads are performance gains (OK, a deceptively "natural" programming model too). And if the performance of forking + message passing is sufficient, that's definitely a gain! Jane Street does some massive parallel processing in real time, so the performance seems to be not that bad after all...

Or maybe the fork+pass model is slower, but they are gaining on the excellent low-latency CG? Jon Harrop puts it like that:
"OCaml has a nice (~10ms) low latency GC and it is easy to optimise OCaml code for low latency. In contrast, the .NET GC is much more complicated and, therefore, harder to optimise for and, in my experience, has >10x higher latency. I suspect that would be a major drawback for Jane Street."
So before you port OCaml's GC to multicore, overcomplicating it and losing its excellent performance, maybe paying some price for the workarounds pays out better?

Having said that, there's recent work on improving the GC-performance even more, and even a new attempt on multicore OCaml. Maybe this time they will be successful. And be wary, not everyone likes OCaml, for some "OCAML sucks" -!

Update: an impressive list of companies using OCaml: (via @jonharrop). So it's not only Jane Street and Facebook!
Update2: Facebook continues working on OCaml ecosystem - Reason, a "new interface to OCaml" ... "provides a new syntax and toolchain for editing, building, and sharing code". Plus Infer - the iOS/Android app checking tool!

* "OCaml for the masses" -

** "The rise and fall of OCaml" -

 ***  As someone said, Haskell was created to do research on Haskell (and OCaml was created to write theorem provers).