Thursday, 13 March 2014

Another case for embedded DSLs


As already stated in my previous post I'm not very convinced about embedded DSLs, I cite:
Until recently I'd only sneer at the idea of DSL's: what the @!%*, either we have a solid API for developers, or a specialized higher level language for the specific task. And the higher level language shouldn't be done at all in the best case, because the non-technical user just wants a smooth GUI to click his requests together! So spare your DSL hype on me, go and find another gullible middle management person.
Did you notice the "until recently" moniker? As it turns out, I found some use for E-DSLs in the end, but a general suspicion remained. But recently I happened to hear a presentation by Joel Falcou and that has shown me another use for E-DSLs!

The talk was about:
... using the Boost.Proto library - a C++ EDSL toolkit - to redesign NT2, a C++ scientific computation library similar to MATLAB in term of interface.*
The talk was mainly about template metaprograming, expression templates, and integrating hardware descriptions into the system, but one phrase catched my ear, namely: " similar to MATLAB in term of interface"! Look below for how you could replace MATLAB with NT²:

Recipe:
  • Take a  .m  (i.e. MATLAB ) file, copy it to a  .cpp file
  • Add  #include <nt2/nt2.hpp>  and do cosmetic changes
  • Compile the file and link with libnt2.a
And what are the "cosmetic" changes, s'il vous plait?

MATLAB code:
R = I(: ,: ,1);
G = I(: ,: ,2);
B = I(: ,: ,3);

Y = min ( abs (0.299.* R +0.587.* G +0.114.* B) ,235);
U = min ( abs ( -0.169.*R -0.331.* G +0.5.* B) ,240);
V = min ( abs (0.5.*R -0.419.*G -0.081.* B) ,240);

equivalent NT² code:
auto R = I(_,_ ,1) ;
auto G = I(_,_ ,2) ;
auto B = I(_,_ ,3) ;
table <float , of_size_ <N,M> > Y, U, V;

Y = min ( abs (0.299* R +0.587* G +0.114* B) ,235);
U = min ( abs ( -0.169*R -0.331* G +0.5* B) ,240);
V = min ( abs (0.5*R -0.419*G -0.081* B) ,240);

What struck me, was that you can switch from the proprietary DSP system to a free C++ replacement that simply! OK; NT² isn't a perfect drop-in replacement, but it's good enough if we trade off royalties to be paid against it!

And thus the second use-case for embedded DSLs emerges: lure the user of another language into C++! Or putting it differently: give the user a C++-alternative and let him choose.

BTW: did you notice that the EDSL use-cases always involve the ominous "user", which doesn't really know C++?

--
* Slides for the talk are here.

Sunday, 16 February 2014

Who's to blame? - or the most amazing multithreading bug which wasn't one!



Typically bug-hunting is quite simple, as not to say trivial. But sometimes you encounter some species of bugs that makes you mad for weeks!!! Programming pundits maintain that this are the C++ multithreading bugs, because locks, well..., um..., they just don't compose! So what are we supposed to do? Either we should wait for transactional memory support, or switch to another language, preferably Haskell or Clojure. OK, will do that, but in the meantime there are products to be developed, delivered and sold*. So that's what happened:

1. The Bug

My current project is developed on Windows with Microsoft's Visual Studio as IDE and Intel's XE as a plug-in compiler. We choose Intel XE because it is (unsurprisingly) very good at optimizing code for the Intel platform and, not least, because of its support for the new C++11 standard. Our code is full of boost::thread, boost::mutex, std::atomic's and its ilk, so it falls in the category the gurus are warning of. But - the architecture of the system was recycled from the previous product, so there's no real use in complaining about the non-composability**, and, as we are in the business of streaming files out, the performance is more important than advanced architectural concepts. I just removed a couple of vtable race-conditions from the code and the whole stuff seemed to be running well enough!

But then we changed the Boost version, updated the compiler version, at the same time I made some changes to the threading model to make the internal architecture more dynamic, and booom!, everything stopped working in a huge crash! 


It was "Illegal instruction" and sometimes "Privileged instruction" all the time. What? I'm not issuing assembler instructions in my code! This is definitely something where I can shift the blame on the compiler! On the other side, the C++ standard says that data races constitute the dreaded undefined behavior, in short, that all your bets are off. So why not an illegal instruction? But then there was another type of crash: "Access violation". OK, that's something you could definitely blame on your programming or a data race.

Let me put one thing straight - the crashes came only when the debugger was attached; without debugger the process run without a hitch. But unfortunately this fact doesn't mean very much in the art of debugging multithreading applications - debugger changes program's memory boundaries, ant maybe only accelerates the manifestation of bugs that are otherwise hiding in dark corners of code wanting to become big, ugly Heisenbugs!

First I suspected another vtable race condition and indeed fixed some more instances of it, but that didn't change the overall picture! Next, after inspecting the assembly code following problems were revealed:

First, there were crashes in code like that:
5DF3B12C mov dword ptr [ebp-38h],edi 
5DF3B12F mov dword ptr [ebp-3Ch],esi 
5DF3B132 mov dword ptr [ebp-40h],ebx 
5DF3B135 mov dword ptr [this],ecx 
5DF3B13B mov eax,dword ptr [___security_cookie (5E054DC0h)] 
5DF3B140 xor eax,ebp 
5DF3B142 mov dword ptr [ebp-14h],eax 
5DF3B145 mov dword ptr [ebp-1E0h],eax 
5DF3B14B mov byte ptr [ok],1
where debugger was pointing at mov byte ptr [ok],1 as illegal instruction! What is here illegal! No advance was possible in this case.

But more often I could see something like:
5DF3B152 mov eax,dword ptr [input_size] 
5DF3B155 db 89h 
5DF3B156 test eax,edx 
5DF3B158 db feh 
5DF3B159 db ffh
here the current instruction was, quite understandable, db feh. The problematic part was that before the crash this code looked completely different, i.e. there wasn't a trace of the suspicious db instructions! Well, that way or another, it looked like some dangling pointer problem where memory gets overwritten at some arbitrary address. Or does it? On Windows you cannot overwrite the program section of the process, as there is the code segment protection enabled by default (at least nowadays). Whatever! - I tried to set a data breakpoint ate the changed location, but no change at the given address was reported by the debugger. What the heck, I see that my assembly codes changed, how can debugger be so stupid! So there wasn't any progress on this avenue either.

The second general class of errors was the one causing access violation. It emerged, than an innocuously looking stack pointer manipulation like add esp,0FFFFFFF0h would produce illegal stack pointer value blowing the program off, even though the previous ESP value was well in the allowed range! Looks like sorcery!
5FB4B906  cmp         eax,0BCh
5FB4B90B  jae         5FB4BC25
5FB4B911  add         esp,0FFFFFFF8h
5FB4B914  lea         eax,[conv]
5FB4B91A  mov         dword ptr [esp],2
with register contents like that:
EAX = 322CE9AC EBX = 1BB16DE0 ECX = 359A56A0 EDX = 00000021 ESI = 322CED30 EDI = 322CEBB4 EIP = 5FB4B91A ESP = FFFFFFFC EBP = 322CEB84 EFL = 00010286
Well, with stack pointer corrupted, the result of this was:


Here's another beauty - debugger reported a stack corruption here, as it was completely bewildered with what was going on:



So how was this bug to be handled, and who shall win the blame game?

2. The Explanation.

I tried to inspect the memory directly in the memory window of the Visual Studio debugger, but it was somehow too low lever for my mood at that moment. So because:
  • the problems always occurred in the vicinity of a break point,
  • the debugger was always displaying correct values of the operands in the "Access violation" case,
  • the debugger was never showing the illegal instructions, only after the crash happened,
I decided that the single explanation left was that it's something to do with the tools. Therefore I duly reinstalled everything in my toolchain, Visual Studio, Intel compiler, Boost libraries, in the vague hope that the installation of some of them could be broken. Alas, nothing seemed to help.

At this point, I declared the issue as solved - I had some real work to be done, and if you didn't define excessively many breakpoints, the debugger typically played to the rules. Except when it really mattered of course (sigh).

I explained the problem to my worthy colleague Bernhard Z. so he could take it up, and he investigate it a little bit further. And indeed - he found out some very interesting facts about how exactly the debugger has been guilty!!! 

Contrary to what Visual Studio is telling you, it is possible to attach to a running process with two debuggers (!!!). It is done like that - first attach to it with Studio, and then start WinDebug (don't forget to check the "Nonintrusive" option, or it will complain!). Now, Bernhard did it, and he could have an independant view on the machine code!

An than, with the second view on the assembly code, the problem became obvious - the debugger inserted the debug-break instruction (int3 <=> 0xcc) at the wrong position in the machine code - not before the opcode, but 2 bytes behind it, making the arguments following the opcode to illegal instructions! Look here at the assemble code before setting the breakpoint (the line where the breakpoint has to be set is in bold):
00c7461f 89542404        mov     dword ptr [esp+4],edx
00c74623 e888d00000      call    BoostSerialize!save_data_xml (00c816b0)
00c74628 83c410          add     esp,10h
00c7462b 83c4f0          add     esp,0FFFFFFF0h
00c7462e 8d9560ffffff    lea     edx,[ebp-0A0h]
00c74634 83bd74ffffff10  cmp     dword ptr [ebp-8Ch],10h
00c7463b 8d85c0fdffff    lea     eax,[ebp-240h]
00c74641 0f439560ffffff  cmovae  edx,dword ptr [ebp-0A0h]
00c74648 890424          mov     dword ptr [esp],eax
00c7464b 89542404        mov     dword ptr [esp+4],edx
00c7464f e88cca0000      call    BoostSerialize!save_data_binary (00c810e0)
00c74654 83c410          add     esp,10h
00c74657 83c4f0          add     esp,0FFFFFFF0h
00c7465a 8d9560ffffff    lea     edx,[ebp-0A0h]
00c74660 83bd74ffffff10  cmp     dword ptr [ebp-8Ch],10h
00c74667 8d85c0fdffff    lea     eax,[ebp-240h]
00c7466d 0f439560ffffff  cmovae  edx,dword ptr [ebp-0A0h]
00c74674 890424          mov     dword ptr [esp],eax
00c74677 89542404        mov     dword ptr [esp+4],edx
00c7467b e8d0b90000      call    BoostSerialize!save_data_string (00c80050)
And now the assembler code after the breakpoint has been set:
00c74623 e888d00000      call    BoostSerialize!save_data_xml (00c816b0)
00c74628 83c410          add     esp,10h
00c7462b 83c4f0          add     esp,0FFFFFFF0h
00c7462e 8d9560ffffff    lea     edx,[ebp-0A0h]
00c74634 83bd74ffffff10  cmp     dword ptr [ebp-8Ch],10h
00c7463b 8d85c0fdffff    lea     eax,[ebp-240h]
00c74641 0f43cc          cmovae  ecx,esp
00c74644 60              pushad
00c74645 ff              ???
00c74646 ff              ???
00c74647 ff8904248954    dec     dword ptr [ecx+54892404h]
00c7464d 2404            and     al,4
00c7464f e88cca0000      call    BoostSerialize!save_data_binary (00c810e0)
00c74654 83c410          add     esp,10h
00c74657 83c4f0          add     esp,0FFFFFFF0h
00c7465a 8d9560ffffff    lea     edx,[ebp-0A0h]
00c74660 83bd74ffffff10  cmp     dword ptr [ebp-8Ch],10h
00c74667 8d85c0fdffff    lea     eax,[ebp-240h]
00c7466d 0f439560ffffff  cmovae  edx,dword ptr [ebp-0A0h]
00c74674 890424          mov     dword ptr [esp],eax
00c74677 89542404        mov     dword ptr [esp+4],edx
00c7467b e8d0b90000      call    BoostSerialize!save_data_string (00c80050)
Do you see it now? In Visual Studio's debugger you wouldn't see anything because Visual Studio hides debug-breaks in the disassembly view! That's the typical Microsoft attitude - we know better what you'd like to see - and they are right with it in about 95%. ... until it bites you!

The second type of crash can now be explained too: 0xcc overwrote the correct operand of the add esp instruction, the addition overflowed, and then we got the access violation!

A "trouble ticket" was issued to Intel: http://software.intel.com/en-us/forums/topic/489007. They didn't believe it of course...

3. The moral of the story

Only after that all became clear we could positively be sure that this wasn't a multithreading bug, and were able to continue the development as before ;).

The moral is simple, and already well known - sometimes you can blame it on your tools. 
Plus perhaps another well known one - you always need a second opinion (i.e. WinDebug's)!

And because it seems to be a good time for some bragging, it wasn't the only "you can blame it on your tools" moment in my programming career - a couple of year ago I diagnosed broken multithreaded exceptions/stack unwinding implementation in a port of the gnu compiler to some obscure UNIX derivative (won't tell the names, however, but it was rather a big Telco company). As a result the whole project had to disable exceptions in the compiler. Don't believe it's possible young Jedi? Yes, it is (or it was at that time - I never needed it again).

--
* some discussion about why locking & mutexes are the scapegoats of programming pundits can be found in the "Is Parallel Programming Hard, And, If So, What Can You Do About It?" book here

** for myself I'd opt for a share-nothing, message-passing, "worker threads"-oriented architecture - but well, nobody asked ;). And that post about "Threads as Workers" isn't still written anyway.

Tuesday, 12 November 2013

Functional programming, immutability and the World.


In C++  if you write a cooking_meat() function it returns NOT a new piece of meat, but one with a changed state.

In Clojure we have:
  • immutability - so we get here a NEW piece of meat
In real life it'd be not a wholly new piece of meat!
  • world -> constantly changing state, we cannot understand all of the Change
  • thus -> functional programming - our unability to cope with Change, to understand the world we are living in?

The symbol-pointer equivalence.


Recently, I realized an interesting little fact when comparing LISP-type and C-type languages. It may seem that, by their design and philosophy, they cannot be more distant from each other, but well, it all boils down to memory locations and addresses.

Let's take the a LISP symbol a, and a C++ object obj. If you take a for itself, it resolves to its value, if you don't want that, you have to quote it. On the other side, of you take a pointer to the C++ obj, you have to dereference it directly if you want the underlying value:
  'a <=> &obj
   a <=> obj
and
   a <=> *obj_ptr
  'a <=> obj_ptr
I think it's cool. Think of symbols as pointers.

Birthday paradox, Lotto and learning


Warning: this post is about how we learn but in reality we don't.

You may know the "birthday paradox" - it's a cute little story in probability theory, and as it happens it might be useful to know in a programming interview. So many of us learn about it and might even be able to derive the probability bound and to reason about it. But do we really learn? Or do we just memorize some story which will be called out in the "technical interview"?

Let us apply it to the case of lottery gambling. ...

The hard question here is - which numbers should I choose to win the jackpot. Well, the answer is, we don't know. The not so hard question is - which numbers NOT to choose. Here we can definitely offer some responses. We shouldn't choose the numbers most people would choose. Can you sense the application here? Well, most people will use their birthday date in some form (as it's the number they are most accustomed to). And the birthday paradox teaches us that... we ourselves shouldn't use it!

Would you see that? Possibly not. No party, no room, no people talking to each other as in the classic setting of the story. That's sad. That's not what I like about maths (and programming). But isn't it like that in the real life? If you'll be interviewed you have to reproduce the known answer first. If you're good at it, some deeper questions will follow, most of them with a known answer. And we learn that stuff to be used in an interview... But is that learning?


PS: admittedly, this isn't any direct application of the paradox, but rather a "reverse" one, nevertheless I see it as cool math giving hints in seemingly unrelated problem. And when we learn about it, we don't play enough with ideas, we just want to "pack it in" and have it ready for that interview.

Wednesday, 30 October 2013

Some basic decltype-SFINAE


In this blog I'd like to give you the impression that I'd know everything ;-), but, alas, that's not true. Mainly because with the advent of C++11 there's a lot of things for me to learn. You may say I'm late to the party, and yes, I was somehow lazy playing (or rather not playing:) with C++11 compilers. That was because my clients didn't want to use the newest features yet (you know, a bunch of conservative ... etc.). But now, all of a sudden, I can use every C++11 and Boost feature I want in my daily work (yay!!!), and there's definitively a difference between playing with something and using it on a daily basis!

Back to the things I should learn. There are many of them, but we'll take the decltype-SFINAE as an example. The what-SFINAE? Or: the decltype-what?? Or, even worse: the what-what???

So what is SFINAE really? You can google it, or you can just take my word that SFINAE works by purposely introducing compile errors (thus the SF part in the name) which will be then ignored by compiler, (the INAE part in the name) just to remove one of template specializations to suit your purpose!

The idea is that for one class you'll have one set of template definitions (where the unwanted ones will be ignored by compiler), and for an another class a different one. Simple, isn't it? ( BTW as I'm getting older and older I came to recognize that most of the human ideas are simple, but some of them just aren't clearly stated; that seems to be an ego-thing for some people...)

So what is the decltype-SFINAE? Simple again, it's using SFINAE in the C++11 auto+decltype syntax for function declarations. Like here:
  template <class T> auto xxx() -> decltype(XXX, string) { ... }
How's that? Remember, decltype takes an expression as argument, and comma operator in expressions renders the last part as the value, so everything before the comma (i.e. the XXX thing) can be used to fool the compiler into/out of SFINAE. Tada!

When I first saw this technique in action*, used for detecting if some method is defined in a given class, there was an utility test class defined:
  template<typename T> struct has_size_method 
  {  
    //...

    template&ltclass C> static yes test(SFINAE<C,&C::size>*); // won't explain this impl.!

  public:
    static constexpr bool value = std::is_same<decltype(test<T>(nullptr)),yes>::value;
  };
and then it could be used like that:
    cout << has_size_method<ClassXXX>::value;
I wanted the same functionality (i.e. detection of a special optional method) but I didn't want another utility class. Frankly speaking, I'm more for the direct approach, not for burying simple concept under irritating levels of indirections. I a word, there had to be a better way! So I tried this:
  // got get_channel_id()?
  template <typename T, typename U>
  auto mk_object_name(T* obj, U prefix) 
    -> decltype(std::declval<T>().get_channel_id() == 1, std::string()) 
  {            
    return format_strg(prefix * 1000 + obj->get_channel_id(), std::hex, typeid(*obj).name());
  }
  
  // nope!
  template <typename T, typename U>
  std::string mk_object_name(T* obj, U prefix, T* obj2 = nullptr) 
  {
    return format_strg(prefix, std::hex, typeid(*obj).name());
  }
The first overload will be removed by SFINAE if the type T doesn't have the get_channel_id() method, the second one will be always present in the normal case, and let's hope that it's sufficiently different for the first one. Simple, isn't it? Then I used it like this:
  struct XXX {
    int get_channel_id() { return 1; } 
  };
  struct YYY {
    int get_YYY_id() { return 1; } 
  };

  main()
  {
    XXX x;    
    std::string nameX = mk_object_name(&x, 100);
    std::cout << nameX << std::endl;

    YYY y;    
    std::string nameY = mk_object_name(&y, 200);
    std::cout << nameY << std::endl;
 }
Alas, it didn't work for the XXX struct. That is what comes from hoping that the templates will just match! The obvious problem here is, that in this case both overloads are enabled. You did spot it at once? Sorry, I said this is a post about basic(!) decltype-SFINAE! So what can we do to hide the base version if the extended version is possible? A template function for dispatching to the desired overload proved to be sufficient:
  // actual functions
  template <typename T, typename U>
  auto mk_object_name_helper(T* obj, U prefix, int) 
    -> decltype(std::declval<T>().get_channel_id() == 1, std::string()) 
  {            
    return format_strg(prefix * 1000 + obj->get_channel_id(), std::hex, typeid(*obj).name());
  }

  template <typename T, typename U>
  std::string mk_object_name_helper(T* obj, U prefix, long) 
  {
    return format_strg(prefix, std::hex, typeid(*obj).name());
  }

  // "dispatching" function
  template <typename T, typename U>
  std::string mk_object_name(T* obj, U prefix) 
  {
    return mk_object_name_helper(obj, prefix, 0);
  }
And then it worked like a charm**. I think, it's not too much overhead when compared to the has_size_method template shown above. The dispatching function just uses the fact that int is a better overload for zero than long.

You can compile the above code online, for example at http://gcc.godbolt.org/ use g++-4.8 with the -std=c++11 option. The nice thing there is that in the assembler window you can see what template overload was selected. Additionally you'll see the famous template bloat in action! It's fun! If you'd only like to let it run online, try http://liveworkspace.org/.

PS: And what was it all for? Did you get it? We were generating object names for various "software device" types, some supporting only the board IDs, some offering additional support for channels.

--
* look for example here: http://dev.krzaq.cc/checking-whether-a-class-has-a-member-function-with-a-given-signature/

** There's another way, of course! Facebook's Folly library accomplishes it with enable_if and the pair of is_same and !is_same in the definition of the return value of a function:
  template <class T>
  typename std::enable_if<
    HasLockUnlock<T>::value && !std::is_same<T, boost::shared_mutex>::value>::type
  acquireRead(T& mutex) {
    mutex.lock();
  }
  //Special case for boost::shared_mutex.
  template <class T>
  typename std::enable_if<std::is_same<T, boost::shared_mutex>::value>::type
  acquireRead(T& mutex) {
    mutex.lock_shared();
  }
 Look here for overview of how the classic enable_if<> can be used. 

Thursday, 17 October 2013

How does C++11 bugfixing in C++14 look like?


Maybe you've heard it already, but the planned C++14 standard will be a small, bugfixing release for the C++11 specification and not a true next version of C++.

Well, not really (see here), there are a couple of interesting genuine new features added! For example: generic lambdas (yay! see my old post...), lambda capture initialization expressions, function return type deduction (i.e. auto on functions), templates for variables, runtime-sized arrays (int arr[varX]), binary literals (011100b), shared mutexes and locks, std::optional, standard user-defined literals (like "xxx"s for std::string), std::dynarray, tuple access via type (get<int>(tupleX)). I don't know what you think, but for me that's quite a list of new features (and that's not a complete list) to be digested!

But some of the changes are real bugfixes. For myself, I couldn't imagine what bugfixing could be supposed to mean - are there bugs in the standard? With best C++ gurus working on that? Well, unfortunately there are! One of them is the absence of std::make_unique() counterpart to the std::make_shared(). What is is the reason for that? Herb Sutter explains:
" That C++11 doesn’t include make_unique is partly an oversight, and it will almost certainly be added in the future. "
 What, an oversight! But beacuse of that you cannot write a safe code for multiple resource initialization using std::unique_ptr! That (and hearing them speaking on Going Native 2013) convinces me that the gurus are just humans and developers like you and me!

Another case in point is taken from a discussion on complex initialization idiom. Here's the code:
  const int i = [&]{
    int i = some_default_value;
    if(someConditionIstrue)
    { 
      // Do some operations and calculate the value of i;
      i = some calculated value;
    }
    return i;
  } () // note: () invokes the lambda!

Well, it looks absolutely innocuous (and kind of JavaScript like!), and it will even compile. But Andrzej Krzemieński pointed out that:
I believe it is not a valid C++11 code. According to sect 5.1.2 para 4, when the body of the lambda is not a simple return statement its deduced return type is void. While the expectation that this example would work is logical, and likely acceptable in C++14, C++11 seems to refuse it.
And you know what? Herb explains:
The committee has already pretty much agreed that this should be allowed (which is why our compiler also already supports it, and with GCC and Clang that would make it de facto portable), and this extension is likely to be voted into the C++ working paper soon (this month, or September).
Thus making it a classic example of bugfixing! Well, not really, the standard says that lambda return type can be deduced, but only when there is exactly one statement, and that statement is a return statement that returns an expression. OK, but compilers can do much more than that already, so it's an unnecessary restriction we can do away with! Thus in this case it's not really an oversight, but rather lack of time to process feedback from compiler writers. Bjarne was always saying that the committee is doing unpaid work and that mostly in their free time.

Update:

Ooops, I forgot that std::optional and std::dynarray were moved out of the standard and parked in TS.  Thanks for correction goes to @meetingcpp!