Tuesday, 6 December 2016

Two interesting quirks of C++ uniform initialization


We all know and love the new C++11's uniform initialization feature (aka curly braces initialization), and, to be frank, there is much to warrant this love, like:

with structs
  struct X { bool x{false}; int i; ...};
  X x = { true, 100, ... };
  X x{ true, 100, ... };
with arrays:
  struct Y { int arr[3], bool b, ...};
  Y y{ {0,1,2}, false, ... };
  int* intArray = new int[3]{1, ,2, 3};
with library classes:
  std::vector<int> a = { 1, 2, 4 }; // yesss! At last!
  QMap<QString, QVector<int>> a = { {"0x111", { 1, 1, 1} }, {"0x100", { 1, 0, 0} } };
and you can enable it for you own classes as well, writing a constructor taking std::initializer_list as argument (example missing, but you know what I mean...)!

because it's universal, you can use it for types too:
  int i{1};
  int j{};
but it's a bit redundant here, as we already could do:
  int i(1);
  int j(0); // not j()! I never used () but assumed it to be the default initialized int :(
but there are 2 additional goodies packed into that, as I learned recently, namely:

1. The first one
this is an old problem: unexpectedly, the compiler will consider this:
  TimeKeeper time_keeper();
not to be an object instantiation but a function definition! Here TimeKeeper is a class (reused) from the Wikipedia article:
  class TimeKeeper {
    public:
      TimeKeeper();
      int get_time();
  };
This the compiler will balk at:
  int t = time_keeper.get_time();
But thanks to uniform initialization not at this:
  TimeKeeper time_keeper1{};
  int t = time_keeper1.get_time();
OK, you are right, that's not the most vexing parse ๐Ÿ˜‰, but simply incorrect usage of the constructor! The most vexing parse requires a parameter to the constructor! But I made this error several times myself when blindly typing ahead...  none the less, the problem is the same, only with a parameter:
  TimeKeeper time_keeper(Timer());
Here is a function with taking a function(!) like Timer mkTimer() as single, unnamed (!) parameter. Vexing? Now correct that with a single stroke (or two):
  TimeKeeper time_keeper{Timer()};
Nice to know when you need a workaround for a vexing parse!

2. The second one
Here Bjarne himself explains that:
  int x = 7.3; // Ouch!
but
  int x0 {7.3}; // error: narrowing
  int x1 = {7.3}; // error: narrowing
Moreover, compiler will automatically check int sizes on initializing (in Bjarne's words again):
  char c1{7}; // OK: 7 is an int, but it fits in a char  
  char c2{77777}; // error: narrowing (assuming 8-bit chars)
That's nice.

Considered I am a traditionalist and like my code to look like a old, regular C++, but these features make a nice argument in favor of using curly braces instead of the normal ones! Will for sure consider that!


Saturday, 3 December 2016

N3599 proposal, typesafe printf() and some C++ explanations



Reading the C++ Tips, 2016 Week 46 I stumbled over following syntax :
  template <typename T, T... chars> 
  constexpr CharSeq<chars...> operator""_lift() { return { }; }
Only to be instructed in the comments section that this is a non-standard extension of user defined literals, that Clang and gcc nonetheless seem to support (still learning C++14 as you can see). I wondered if my not-so-new Visual C++ 2013 compiler would support it too, so I had a closer look at it.

I soon found out*, that there's a proposal for this, namely N3599. This proposal being quite old (March 2013), I thought that my chances aren't that bad... So I took the first example usage of the feature I found in N3599 and tried to compile it in Visual Studio. Result? Not supported, of course, the "rejuvenation" of Microsoft compiler is still underway**.

At this point I got quite interested in the code itself and in the possibility to got it running even without the templated user defined literals. So here it comes, my explanation how the code is working, because it completes a non-trivial feat - to generate a function of given type from a given string (aka. textual description!).

But wait, type safe printf? Didn't Bjarne explain it somewhere already***? I think I heard something like this, but didn't check it... Stop, let us stay with the original, humble goal of understanding a piece of template code, which wasn't entirely clear at the first sight.

1. Template code analysis and explanation

The original code was:
// A tuple of types.
template<typename ...Ts> struct types {
  template<typename T> using push_front = types<T, Ts...>;
  template<template<typename...> class F> using apply = F<Ts...>;
};

// Select a type from a format character.
template<char K> struct format_type_impl;
template<> struct format_type_impl<'d'> { using type = int; };
template<> struct format_type_impl<'f'> { using type = double; };
template<> struct format_type_impl<'s'> { using type = const char *; };
// ...
template<char K> using format_type = typename format_type_impl<K>::type;

// Build a tuple of types from a format string.
template<char ...String>
struct format_types;
template<>
struct format_types<> : types<> {};
template<char Char, char ...String>
struct format_types<Char, String...> : format_types<String...> {};
template<char ...String>
struct format_types<'%', '%', String...> : format_types<String...> {};
template<char Fmt, char ...String>
struct format_types<'%', Fmt, String...> :
  format_types<String...>::template push_front<format_type<Fmt>> {};

// Typed printf-style formatter.
template<typename ...Args> struct formatter {
  int operator()(Args ...a) {
    return std::printf(str, a...);
  }
  const char *str;
};

template<typename CharT, CharT ...String>
typename format_types<String...>::template apply<formatter>
operator""_printf() {
  static_assert(std::is_same<CharT, char>(), "can only use printf on narrow strings");
  static const CharT data[] = { String..., 0 };
  return { data };
}

void log_bad_guess(const char *name, int guess, int actual) {
  "Hello %s, you guessed %d which is too %s\n"_printf(
    name, guess, guess < actual ? "low" : "high");
}
Not quite easy to read, isn't it? But let us go through that step by step, using the old trusty top-down method.


1. log_bad_guess() function uses the custom _printf  literal operator for the format string, and then.... seems to be calling itself with more parameters??? Whassat? Are we moving towards Haskell-like unreadability?

2. Maybe not. If literal operator applied to a string can be called then it must return a callable, i.e. something with operator (). Look at _printf's definition - it is a template returning something complicated. This something must then define the call operator, so lets look for it.

3. This something is created by "calling" the apply "method" of a format_types structure with the  formatter function. Of course all of it at metaprogramming, i.e. compile-time, i.e. types-only level, so we need a short explanation here:

The apply metaprogramming "method" is one of basic type manipulation primitives which can be written quite simply with the new C++11 parameter packs. It just sets parameters for a given template expecting some parameters. In our case, it creates correctly typed version of the formatter function. Correctly typed means that the input parameter types will match the types required by the format string. And these types will be provided by  format_types type list. A type list is implemented as a template parameter pack.

The another utility is the push_front "method" - it just extends a typelist with a new type. Cool.

BTW, there is a nice, short and free ebook (by @joel_f and @edouarda14) about modern C++ metaprogramming explaining such basic operations if you want to learn more. So have a look at it, with parameter packs the typelist manipulations got so much easier in C++!

4. So how format_types is generated? This is done with a series of template specialzations which descends the format string recursively, processing 2 character at one step, and if the pair starts with %, an appropriate type is added to the type list. This is done by appropraitely specialized format_type_impl template, which ist than aliased to format_type as to directly acces it's type "member".

5. Now we have all the elements: _printf is generating a callable with function signature derived from the parsed format string, and then we are calling it with matching parameters. Was it that difficult? Not really, right?

2. Getting it runing with VS 2013 compiler

Alas, VS 2013 doesn't like the template<...> operator "" _printf construct: .... What can we do? I tried following changes:
template<typename CharT, CharT ...String>
typename format_types<String...>::template apply<formatter>
 printf() {
    static_assert(std::is_same<CharT, char>::value, "can only use printf on narrow strings");
    static const CharT data[] = { String..., 0 };
    return{ data };
}
You see, now we have a function printf() returning a functional object (thus it's a HOF - higher order function of kinds) which is generated based on the format string, as we have seen above.

Note that I had to change std::is_same() to  std::is_same::value as VS2013 compiler does not support constexpr, and operator() should be const here!

Then I wanted use it like that:
    void log_bad_guess(const char *name, int guess, int actual) {
    auto p = printf<char, "Hello %s, you guessed %d which is too %s\n">();
    p(name, guess, guess < actual ? "low" : "high");
}
but... compiler error! Seems compiler cannot match char string with CharT ...String&, hmm, lets try this:
void log_bad_guess(const char *name, int guess, int actual) {
    auto p = printf<char, '%', 's', '%', 'd', '%', 's'>();
    p(name, guess, guess < actual ? "low" : "high");
}
Yesss, now it's compiling! So if we change the format string to "%d" like this:
    auto p = printf<char, '%', 's', '%', 'd', '%', 'd'>();
We should get a compiler error, preferably something like: "Cannot call XXX with YYY parameters". What VS 12013 reports is however:
  error C2664: 'int formatter<format_type_impl<115>::type,format_type_impl<100>::type,format_type_impl<100>::type>::operator ()
(format_type_impl<115>::type,format_type_impl<100>::type,format_type_impl<100>::type)' : cannot convert argument 3
from 'const char *' to 'format_type_impl<100>::type'
Ok, speak about compiler template error messages... This could be probably improved with some judicious usage of static_assert, but here we are only seeking understanding, not production code quality. But we know what is going on, right?

So there is a last final touch missing: automatic conversion from char string to a char parameter pack. Well, that's the problem! The proposed (missing) user defined literal operator would do that! As one of fellow bloggers said:
" But a similar template syntax is standardized for raw numeric literals, so the lack of raw string user defined literals in C++14 is a pretty big inconsistency that I hope C++1z will resolve."
So we must implement this ourselves, using a techniques similar to one of the integer_sequence's implementations here, then just pass the char string length via the T (&) [N] syntax, an than it should be working. Or use some of modern C++ metaprogramming libraries like Brigand or Hana...

I won't implement the character sequence generation here, because this post is very long already. The second option is out of question as well, because I want only use "naked" C++ standard library... Thus our goal won't be fully achieved in this blogpost. Sorry ๐Ÿ˜’!

3. Summary

You might be asking "why are you writing that"? It's nothing earth-shattering, only some regurgitated, already known stuff... My explanation: maybe somewhere there's a (young) programmer looking at code like that and thinking "OMG, I'm never gonna to understand that, it's too advanced, it's magic". And the entire industry is asserting this view. I am against such elitism, it's just a piece of code doing some work, humans wrote it so another human might understand it. It's the same I feel about the category theory cargo-cult (read here) - don't fear, don't let discourage yourself by the general sentiment in programming industry, don't believe 10x programmer fairy tales. It's only computing, and it boils down to if, then, else and loop ๐Ÿ˜Š.


So what did we achieve? We just synthesized a function signature via a textual description, and then called a vararg function with parameters of that types. Higher order functions anyone?



--
* through Sumant Tambe's blogpost; "Dependently-typed Curried printf in C++"You may wonder here what that "dependently-typed" moniker might mean.... In course of demythologizing of functional programming ;) I dare say that a dependent type is a type depended on some non-type parameter, aka. tag. An I hope I'm right here, because I didn't google it, only restate what I remember about Idris...Wish me luck ๐Ÿ˜‰.

** Why there is no expression SFINAE yet, etc, quite interesting piece of MS compiler histrory: Rejuvenating the Microsoft C/C++ Compiler

*** Python Style printf for C++ with pprintpp, this goes in similar vein, but the code is somehow very complex, maybe owing to the fact that pairs of curly braces have to be found (Python syntax). Additionally it needs macros - fancy if N3599 would solve that?

Wednesday, 27 July 2016

Structure-like std::tuple usage


While reading P0095R0, WG21* I stumbled upon a cool trick. If you define following structures:
  struct x { double value; };
  struct y { double value; };
  struct z { double value; };

  using point = std::tuple<x, y, z>
you will then be able to write:
  auto x = std::get<x>(point);
  auto y = std::get<y>(point);  
  auto z = std::get<z>(point); 
Look at that! Now we can use std::get<x> to fetch the 'x' value of the tuple, std::get<y> for 'y' and so on, instead of plain, old (and ugly...):
  using point = std::tuple<double, double, double>

  auto x = std::get<0>(point);
  auto y = std::get<1>(point);  
  auto z = std::get<2>(point);
Cool, isn't it?

But the author dismisses such tricks on the spot:
"Should we use this approach everywhere and deprecate the use of struct in any context? In the author's opinion we should not. The use of wrapper types is much more complicated to both read and understand than a plain struct. For example, the wrapper types that were introduced, such as the 'x' type, make little sense outside of their corresponding tuples, yet they are peers to it in scope. Also, the heavy syntax makes it difficult to understand exactly what is intended by this code."
...on the premise that it is too clever. Come on, that little bit of extra code in one place, and then this readability all over your source code? But wait, it comes even harder:
"While the utility of type selection and SFINE for visitors is quite clear for advanced C++ developers, it presents significant hurdles for the beginning or even intermediate developer. This is especially true when it is considered that the visit function is the only way to guarantee a compilation error when all cases are not considered."
That got me thinking. First I wanted to dismiss the argument along the lines of: "what, how can a C++ developer not know what SFINAE is...", but then another thought grew stronger and stronger: "come down from the expert-only, high-church, pure-blood stance, C++ should be usable for beginners too!". Just think about yourself trying to learn an new programming language for a minute. Do you want it esoteric and complicated? Not really. So this is really a show stopper, kind of...

Because, we don't want to give up our freshly discovered syntactic candy so easily, do we?

Thus, I think we need a new syntax extension for C++17 (or C++0y?)**:
  using point = std::tuple<x : double, y : double, z : double>

  auto x = std::get<x>(point);
  auto y = std::get<y>(point);  
  auto z = std::get<z>(point);
Nowadays everybody gets his/hers pet feature added to C++17 (just look at if and switch initializers), so why not?

Come on guys! Who's with me? 


--
* "The Case for a Language Based Variant", C++ Standard proposal P0095R0. It discusses advantages of language based variant implementation vs. a library-only solution.

** of course we could use the brand-new destructuring like in:
  using point = std::tuple<double, double, double>

  auto [x, y, z] = point;
but that's not what we want. We need named access to single elements of the tuple. And don't say we shouldn't use tuple, because we simply want to!

Friday, 15 July 2016

A fairytale found on Stack Overflow


Somewhere on the Internets there is that little fairy tale about a little princess*, written in the
language of dreams, i.e. Javascript:

  function princess() {

      var adventures = [];

      function princeCharming() { /* ... */ }
  
      var unicorn = { /* ... */ },
          dragons = [ /* ... */ ],
          squirrel = "Hello!";

      return {
          story: function() {
              return adventures[adventures.length - 1];
          }
      };
  }

  var littleGirl = princess();
  littleGirl.story();

Can you read it? If not, there is this attempt of translation by Patrick M:
"...the princess() function is a complex scope containing private data. Outside the function, the private data can't be seen or accessed. The princess keeps the unicorns, dragons, adventures etc. in her imagination (private data) and the grown-ups can't see them for themselves. BUT the princess's imagination is captured in the closure for the story() function, which is the only interface the littleGirl instance exposes into the world of magic. – Patrick M Feb 28 '13 at 7:49"  
Still an alien language? Learn some Javascript closures, would you? And then a story unveils:
Domenichino [Public domain], via Wikimedia Commons
Once upon a time, there was a princess, her world was full of adventures: dragons, unicorns, talking squirrels (Narnia?). But because in this world she was only a little girl, she only could tell stories about her adventures. And nobody would believe her, because she couldn't take anybody into her magical land (due to a curse of an old, ugly witch called Ecma). So the princess was both happy and sad, because she couldn't share her adventures with anyone. An that's the tragic of sorcery, you have always to pay a price for it...
Now that you are in the know, did you notice, that the princess doesn't tell anyone about Prince Charming?

--
* source: https://stackoverflow.com/questions/111102/how-do-javascript-closures-work/6472397#6472397

Sunday, 5 June 2016

Converting shared_ptr of Derived to shared_ptr of Base


Years and years ago, as C++ templates were gaining prominence, the common problem users had with them was to understand how the types of particular template instantiations were related. E.g. given the following classes:
  class Base {};
  class Derived : public Base {};

  SomeTempl<Base> tb;
  SomeTempl<Derived> td;
it was somehow non-intuitive that the class of the td object was not a subcalsss of tb. But well, that wasn't that different form the basic language rules for array types:
  Base[100] ab;
  Derived[100] ad;
Here ab isn't in any way related to ad. So some user wished that templates may be somehow related on their parameters but it was an easy task to persuade them, that they were plain wrong. And taking into account that in those distant times templates were used as implementation helper to define containers, this wasn't even that wrong.

Well, until...

You guessed it, until C++11 arrived, and the "C++ renaissance" and "modern C++" was announced. As you might know, in "modern" C++ you shouldn't use naked pointers, instead, you use the appropriate smart pointer class unique_ptr<>, shared_ptr<> or weak_ptr<>.

Ooops, now we don't have a container on our hands! It's rather a kind of decorator, taking a type, and creating a new, but semantically not really different one. What I mean here is, that in this case, we have a warranted need for template types to be related based on their arguments! I.e.:
  unique_ptr<Base> pb;
  unique_ptr<Derived> pd;
pd should behave as subclass of pb, because the unique_ptr<> "decorator" template's intention is to preserve the argument type's pointer semantics! So what now? Language rules for templates cannot be changed, the types cannot be related, so if you want to write covariant member function using pointer types (like here:)
  class X
  {
  public: Base* calculemus();
  };

  class Y : public X
  {
  public: Devived* calculemus();
  }
you cannot use smart pointers instead. That brought us a slew of Stack Overflow questions of general type "How to convert shared_ptr<> to...".

So indeed, what can be done? How to convert one to another? As I said before, template rules cannot be changed at that point. So if not in language definition, so maybe we can solve that in library?

Well, let's try. If we take, for example, a close look at the definition of unique_ptr<T> in the C++11 standard (through http://en.cppreference.com/w/cpp/memory/unique_ptr/unique_ptr this time) we can find the following converting constructor:
template< class U, class E >
unique_ptr( unique_ptr<U, E>&& u );    (6)
....
6) Constructs a unique_ptr by transferring ownership from u to *this
....
This constructor only participates in overload resolution if all of the following is true:
  a) unique_ptr<U, E>::pointer is implicitly convertible to pointer
  b) U is not an array type
  c) Either Deleter is a reference type and E is the same type as D, or Deleter is not a reference type and E is implicitly convertible to D
We see, the converting constructor has to be enabled by SFINAE when the the pointer conversion is sound (probably by std::is_derived or something like that). Nice!

So the right way of converting a unique_ptr<Derived> to unique_ptr<Base> is to assign the later to the former! I knew, the C++ committee wouldn't let us in the lurch!

PS: And if you want to know how to use that possibility to somehow emulate covariant return types through smart pointers, have a look at Arne Mertz's blogpost: http://arne-mertz.de/2016/05/covariant-smart-pointers/

Tuesday, 29 March 2016

HTTPS Support for a Casablanca Client


This post is a continuation of a previous one about SSL and Casablanca C++ REST-API library by Microsoft. There we added SSL support to a Casablanca server. This time we'll do it on the client side. At the first sight it seems to be simple, just write:
  web::http::client::http_client restClient(U("https://xxx.yyy.zzz"));
  auto response = restClient.request(web::http::methods::GET);
and go on! As we saw, the only thing to do is to use a HTTPS URI. Nothing more to do? Not really! The main problem when adding SSL support to the client occurred to be the error handling. More specifically, I wanted to differentiate between an SSL error and a simple no-connection error.

Casablanca uses the new standard std::error_condition/error_category classes for that. The problem is, these classes are yet poorly understood by programmers (at least by me, because it is so new)*, and, which is much worse, that their usage by Casablanca (as of v.2.5.0) isn't consistent between Windows and Linux code!

On Windows, there is a special windows_category defined for system errors, while for Linux the standard std::system_category is used. Worse, the error codes forwarded there come from Boost ASIO, which in its turn just forwards OpenSSL's error codes. Which again depends on the library version :-/.

As to tame this chaos a little, the following little helper class was born:
class CCasablancaClientErrorCode 
{
public:
  CCasablancaClientErrorCode(int errCode) : _errCode(errCode) {};
 
  /**
    Check and translate to POSIX-conforming system error code.

    For HTTP connection problems following codes are used by Casablanca:
        std::errc::host_unreachable, std::errc::timed_out, std::errc::connection_aborted
  */
  bool IsStdSystemError(std::errc& stdErrorCode) const
  {
#ifdef _WINDOWS
    const std::error_condition ec = utility::details::windows_category().default_error_condition(_errCode);
#else  
    const std::error_condition ec = std::system_category().default_error_condition(_errCode);
#endif
    if (ec.category().name() != genericCategoryStrg) 
    {
      return false;
    }
    else
    {
      stdErrorCode = std::errc(ec.value());
      return true;
     }
  }
 
  /**
    Check if the reported error comes from SSL.

    There is only one, generic server SLL certificate error at the moment!
  */ 
  bool IsSslError() const
  {  
#ifdef _WINDOWS
    const std::error_condition ec = utility::details::windows_category().default_error_condition(_errCode);

    if (ec.category() == utility::details::windows_category())
    {
      return _errCode == ERROR_WINHTTP_SECURE_FAILURE;
    }
    else
    {
      return false;
    }
#else
    const std::error_condition ec = std::system_category().default_error_condition(_errCode);

    if (ec.category().name() == genericCategoryStrg)
    {
      return false;
    }
    else
    {
      // ??? OPEN TODO:::: must test, but it will depend on the SSL version !!!
      // - patch Casablanca's ASIO code?
      // return _errCode == 336458004; //== 0x140DF114  // OpenSSL error code  ??OR?? 335544539 == 0x140000DB

      return true; // should be SSL 
    }
#endif
  }

private:
  int _errCode;
};
Note that on Windows, only a single generic SSL error code is supported by Casablanca 2.5. If we wanted to differentiate between SSL error types, we'd have to patch Casablanca code (install a special callback handler to report the context of the HTTP error). I didn't make it, as generic SSL error was good enough for my customer.

Now now we can use this helper class like that:
  CCasablancaClientErrorCode clientErrCode = e.error_code().value();
  std::errc sysErrCode;

  if (clientErrCode.IsStdSystemError(sysErrCode))
  {
    switch (sysErrCode)
    {
    case std::errc::host_unreachable:
      // no connection error!
      break;
    default:
      // other error (std::errc::timed_out, std::errc::connection_aborted)
    }
  }
  else if (clientErrCode.IsSslError())
  {
    // SSL error!
  }
Having done that, the HTTPS support on the Client side was ready.

Update: As mutual authentication wasn't required it this project I didn't spend much thought on that feature, assuming everything would be OK. Unfortunately, prompted by a question of a reader, I checked up my local and also with newest Casablanca code in Github but unfortunately I couldn't find any trace of support for mutual authentication. As it seems it's not yet implemented (05/08/2017).

--
* I won't explain the workings of std::error_category here, for the confused readers this was already done here (parts 1 to 5, I said it's not a piece of cake).

Empirical Results on the Static vs. Dynamic Debate?


You may know that I'm quite interested in the "Static vs. Dynamic Typing" controversy, and already even clearly declared my allegiance basing my decision on some pretty sloppy reasoning (if not largely on the gut feeling...).

But what about some hard facts, numbers, measurements and controlled experiments to decide the debate? Like the Leibnizian* "Calculemus!". Well, some good soul (@danluu) took the burden of this task and looked** at the best known papers, talks  and comparison studies. To what avail?

First there's a warning:
"If you look at the internet commentary on these studies, most of them are passed around to justify one viewpoint or another. The Prechelt study on dynamic vs. static, along with the follow-ups on Lisp are perennial favorites of dynamic language advocates, and github mining study has recently become trendy among functional programmers."
So apparently, there's little interest to find out the truth, only to defend convenient positions. But we are fearless seekers for the truth, and want to know what the results are. Unfortunately the classic studies, when examined in more detail reveal some serious methodological flaws, and the author could only conclude that:
"Other than cherry picking studies to confirm a long-held position, the most common response I’ve heard to these sorts of studies is that the effect isn’t quantifiable by a controlled experiment."
OK, pretty sobering, we still know nothing, it's all lies, damn lies and/or lacking statistical knowledge. But how come the theoretically superior static typing, and that by a simple argument - how can be more support from the compiler not be better than less support - do not bring any measurable improvements?

Lets reiterate the position taken by supporters of dynamic typing in words taken from a  talk at JavaZone:
If these academic computer scientists would get out more, they would soon discover an increasing incidence of software developed in languages such a Python, Ruby and Clojure which use dynamic, albeit strong, type systems. They would probably be surprised to find that much of this software—in spite of their well-founded type-theoretic hubris—actually works, and is indeed reliable out of all proportion to their expectations.
So it seems to work in real life, and the experimental studies cannot prove the contrary. But it also looks like someone is trying hard to ignore something, don't you think so?

But then I discovered a recent paper which proposed an new angle of attack!

While the old studies tried to measure the performance of programmers (and somehow failed to design a meaningful experiment) the authors of the new one pursued a more modest goal. They introduced random changes to program's code and then tried to compile and run it. In their words:
"Our study is based on a diverse corpus of programs written in several programming languages systematically perturbed using a mutation-based fuzz generator.
....
More importantly, our study also demonstrates the potential of comparative language fuzz testing for evaluating programming language designs."
We thus could have a possibility to compare programming languages in some way? Sounds promising! So what are the results?
"The results we obtained prove that languages with weak type systems are significantly likelier than languages that enforce strong typing to let fuzzed programs compile and run, and, in the end, produce erroneous results."
The exact results are maybe interesting***:

Ruby
   - compile: 54%
   - run       :  31%
 Python
   - compile: 43%
   - run       :  25% 
PHP
: (that poster child of bad language design!)
   - compile: 48%
   - run       :  41% 
Javascript

   - compile: 49%
   - run       :  23% 
Java

   - compile: 18%
   - run       :  15% 
Haskell

   - compile: 19,5%
   - run       :  18% 
C++

   - compile: 20,5%
   - run       :  12%

See, On the average, dynamically typed programs are twice as likely to be compiling/running wioth nonsensical source code. That's definitely SOME improvement, isn't it?

On the other side: they are testing programming languages' resiliency against typos - but wasn't that the exact reason for introduction of types in programming languages? So we know that typed languages are good in what they were invented for. But we knew that already.

Update: This blogpost sets up a different thesis: the difference between statically and dynamically typed languages isn't that important. Rather than that, the simplicity of the language design is decisive in order to avoid bugs! IMHO this argument is misguided, because what I wanted to know is, given two languages of "same complexity" (problems, problems, hot to measure that?) which is a better one - the one with or without static typing?

--
*  Nice little fact - Leibniz's program for mathematization of human knowledge was formulated in the following words:
"Actually, when controversies arise, the necessity of disputation between two philosophers would not be bigger than that between two computists. It would be enough for them to take the quills in their hands, to sit down at their abaci, and to say  (as if inviting each other in a friendly manner): Let's calculate! (Calculemus!)"
Unfortunately, this cannot be done (at least according to Turing) and flame wars still prevail.

 ** http://danluu.com/empirical-pl/ - "Static vs. Dynamic Languages: A Literature Review"

*** Caution: they are not exact, I read it out of a diagram!

Friday, 15 January 2016

Enum's forward declarations


C++ Katie (@katie_cpp :) wrote a blogpost* about the new C++11 enums. First I was reluctant to read it, but then I risked a quick look, and indeed, I learned something new.

As in the last couple of years I have been using Visual Studio most of the time, it came natural to me that I can forward declare enums:
  enum MsgType;  // we are in Visual Studio!

  class C
  {
  public:
    bool process(MsgType msg);
  };
but it seems to be only a nonstandard language extension, as normally it's shouldn't be possible:
"It’s because the underlying type of enum is not known. The compiler needs to know how many bytes will be required to store the enum and it cannot be decided based on the forward declaration like above. It’s necessary to see the biggest value stored inside this enum."
Yessss, I can dimly recollect that in the past I couldn't do that (gcc), but then I tried once again, and Visual Studio let me get away with it. Thus VS-compiler seems to default enum's size in such a case to int. With C++11's scoped enums, this behaviour is standardized:
"Let’s try the same with enum class. Now it works! ... The underlying type of enum class is implicitly known by the compiler (it’s int by default)."
As a collorary to the compiler's conundrum with plain enums, there's a catch I wasn't aware of (or didn't think of it at first, after some hours of debugging I'd get it probably):
"Another consequence of unknown underlying type appears when the C++98 enum should be a struct member. The struct size can be different when the code is compiled on various machines!"
That can be a source of some nasty, hard to find bugs, believe me!

And there's a second thing I didn't know, namely that it's possible to specify the undelrlying type in a forward declaration for plain enums (just like in the new "scoped" enums):
  enum MsgType : int; 

  class C
  {
  public:
    bool process(MsgType msg);
  };
Did you know that? I didn't.

At first I thought it would be a C++98 feature already** and was a little bit ashamed of myself - all that years doing C++98 and you didn't know that. Ooops! But then I looked it up and saw that it is a new C++11 feature. So, I'm still learning C++11 apparently.

Summary

All in all a nice post*, <praise> not simply reiterating some new syntax/feature, like so many other bloggers, but also giving some juicy technical bits away! </praise>

Update:
Recently, I saw this infornation on SO: "Microsoft's compiler has allowed specifying the underlying type of an enum as a proprietary extension since VS 2005."

--
* https://katecpp.wordpress.com/2015/12/28/enum-class/, all citations are from that post.

** that could be a single critique point for the above blogpost... but a small one!