Saturday, 15 December 2007

C++ pocket lambda library


What I planned to do at first was to write an article for my old homepage to describe my private C++ lambda library implementation, as I wanted to jazz up my web-site (and to brag a little too...). And only because I couldn't write that article this very blog was born. So at last I'm going to do what I should have done long ago, despite of the pre-christmas madness, and the fact that I've got to find a new project middle in the holiday season!

1. The Problem


I like STL. But one thing always drives me crazy! In C++ you cannot write this:
  for_each(v.begin(), v.end(),
           new class<T> Homomorphoize { void operator()(T* t){do_it(t);}( ));
nor even that:
  class<T> Homomorphoize { void operator()(T* t){do_it(t); };
  for_each(v.begin(), v.end(), Homomorphoize() );
You must declare the doer Homomorphoize functor in the global scope, as to be able to use it in the for_each template instantiation. Yada, yada, yada? Let's see what it means in practice. One of my C++ files looked roughly like that:
  //--- helper classes: no inner classes in C++ :-((
  ...
  class Finalizer
  {
    public: void operator()(Worker* w) { ...
    ...
  };
  class Starter
  {
    void operator()(Worker* w) { ....
    ...
  };
  class NotifSrcFilter : public Worker::TestEventObj
  {
    virtual bool test(D_ev* ev) const { ....
  };
  
  // --- main
  main() {
    ...
    // and start
    for_each(m_workers.begin(), m_workers.end(), Starter());
    ...
And as I like STL rather much, there will be always a couple of such classes at the start of each of my C++ files. So what is so wrong with it? Certainly you could live with this, couldn't you? Well, I don't like the separation of code parts pertaining to a single task. Typically the functors so defined are rather short and will be used in a one single place. So IMHO it's a waste of time looking up the functor definition from the place where it is used in some STL algorithm. And it always annoys me when I'm reading the code: the long prelude which must be skipped to arrive at the real code!

2. The solution


So what could be done? There are several alternatives.
  1. Write the code in Perl, Python, Groovy, etc: these languages have some support for lambda expressions, and many of us would readily jump to the occasion, however in commercial environment it is not always an option. And beside this, not everything would be possible in Python, for example, as it doesn't support full-blown lambda functions, only lambda expressions! In Java there are anonymous classes and they can be used to do exactly that, but they aren't fully functional closures too*.

  2. Write an general functor which can be parametrized before it's used in an STL algorithm: I think I've seen such an implementation in the "Imperfect C++" book, but I was not really convinced. For brevity's sake I won't discuss this technique here.

  3. Use the Boost lambda library: it's exactly what we need here! However... it's not a part of the standard C++ library, and so it's sometimes not an option in commercial environment! And it's big: did you look into the code? In includes other parts of Boost and is not very comprehensible (I didn't understand much)! We'll see later that this is inevitable as the complexity of the library increases, but I don't want to do everything what goes! There is only some subset of the general lambda functionality which I'd use in many programs.
So the result of this investigation was a decision to write small, lightweight and comprehensive lambda library for my own usage. I simply wanted to clear up my code even in a Boost-free environment!

The first thing was to get the requirements right: in this case, to decide what are the most often use cases are. I had a look at my code and found and my initial idea was to do some basic things like:
  find_if(vec.begin(), vec.end(), lambda(_$1 <= 2));
  for_each(vec.begin(), vec.end(), lambda(cout << _$1 <<  "\n"));
  sort(vec_ptr.begin(), vec_ptr.end(), lambda(*_$1 <= *_$2));
  find_if(vec_ptr.begin(), vec_ptr.end(), lambda(_$1->getValue() != 7));
  for_each(vec_ptr.begin(), vec_ptr.end(), lambda(delete _$1));
As I said, we don't want a big, complete lambda library, only a pocket edition. Here you can see the lambda function denotation, an idea which I abandoned rather soon. Frankly, I didn't know how to implement it. I knew from general descriptions of Boost lambda library, that the lambda expression should be generated by compiler via operator overloading not via a function**. So let's continue with the operator overloading idea. But how exactly should the overloading code be working?

Then Bjarne came to the rescue: he presented a following technique in as late as 2000***:
  struct Lambda { };  // placeholder type (to represent free variable)
  template<class T> struct Le // represents the need to compare using <=
  {
     // ...
     bool operator()(T t) const { return val<=t; }
  };

  template<class T> Le<T> operator<=(T t, Lambda) { return Le<T>(t); }
  
  // usage:
  Lambda x;
  find_if(b,e,7<=x); // generates find_if(b,e,Le<int>(7)));
                    // roughly: X x = Le<int>(7); for(I p = b, p!=e; x(*p++));
Do you see how simple it is? The Le<> class represents the comparison and is generated whenever a comparison is needed. How does compiler know is it needed? By the templated <= operator which is invoked as soon as compiler sees the usage of an Lambda typed variable. So the Lambda variable is just a bait for the compiler to lure it into generating our special Le<> class! This is the reason why it's referred to as a placeholder variable.

Having clarified that, we can write code to do the following:
  find_if(vec.begin(), vec.end(), _$1 <= 2);
  find_if(vec.begin(), vec.end(), _$1 == 1);
OK, tacitly we've done some design decisions: we won't use the lambda denotator, and we don't need to declare the placeholder variable, as it'll be part of the library. Moreover, we decided how the placeholder variables (yes, there will be more of them) are named. As I'm coming from Unix and shell tradition, the $1 syntax seemed more natural to me than the functional languages "_" placeholder as used in Boost. But the $1 syntax made my assembler suffer (sic!) and I decided for the middle ground: _$1!

3. The fun begins


Encouraged by the initial success we'd like to write more lambda expression using the same basic technique. Let us try to implement this:
  vector<int*> vp(10);
  find_if(vp.begin(), vp.end(), *_$1 );  // hack: means find if not 0
i.e we'd like to dereference the actual iterator. Well, this requires a somehow different approach: an operator on our placeholder variable. Not very difficult stuff. We extend our placeholder class like this:
  template <int Int> class placeholder : public lambda_expr
  {
    // ...
    public: Deref operator*() { return Deref(); }
  };

  placeholder<1> _$1;
  placeholder<2> _$2;   // etc...
So the * operation on a placeholder returns an instance of the Deref class, which looks like this:
  struct Deref : public lambda_expr
  {
      Deref() {}
      template<class T> T operator()(T* t) const { return *t; }
  };
i.e. it will, in turn, dereference its argument (the iterator) when invoked by STL via the function call operator! Simple! In the same manner we can define an Addr<> class and overload placeholder's address operator, which allows for the following code:
  // init a vector of integers
  vector<int> v10(10);
  for_each(v10.begin(), v10.end(), _$1 = 7);
  
  // construct an index vector
  vector<int*> vp(10);
  transform(v10.begin(), v10.end(), vp.begin(), &_$1);  // store ptrs to v10!!
  sort(vp.begin(), vp.end(), *_$1 <= *_$2);  // now sort the index instead of the data
Cool, we have obtained pointers to all elements of v10 and stored it away! And before we initialize the source vector elements to a value of 7! How did we do it? As the astute reader will probably know, we defined an Assgn<> class, which will be returned by the overloaded assign operator of the placeholder<> class. The sort using dereferenced comparison is a piece of cake for us! Or is it? Well, here we have placeholders on both sides of the comparison. In math-speak we no more need an unary operator on placeholders (like: *, & or ++), wee need a binary one! So let's define it:
  struct Le2 : public lambda_expr
  {
      Le2() { }
      template<class T> bool operator()(T a, T b) const { return a <= b; }
  };

  Le2 operator<=(placeholder<1>, placeholder<2>) { return Le2(); }
Not really different from what we've had before, is it? But wait, didn't we forget something? What about dereferencing the iterators before comparison? Well, we need one layer more for that:
  template <class S, class T> struct Le2_forw : public lambda_expr
  {
    S e1;
    T e2;

    Le2_forw(S s, T t) : e1(s), e2(t) { }
    template <class U> bool operator()(U a, U b) const { return e1(a) <= e2(a); }
    // ...
  };

  template <class S, class T>
    Le2_forw<S, T> operator<=(S s, T t) { return Le2_forw<S, T>(s, t); }
Now we are using the forwarder Le2_forw<> to sore any expression used on both sides of the <= comparison, remember them an then invoke them when the final comparison is done. Uff! OK, but this one will work without forwarding layers!
  find_if(vp.begin(), vp.end(), *_$1 == 1);
Or at least it should: we defined the operators for comparing placeholders to a value and dereferencing them! Well not, in the simple form as they are coded, they cannot be combined! We need to write an extra class to handle this combination:
  template<class T> struct EqDeref : public lambda_expr
  {
    T val;
    EqDeref(T t) : val(t) { }
    bool operator()(T* t) const { return val==*t; }
  };

  template<<lass T> EqDeref<T> operator==(T t, Deref) { return EqDeref<T>(t); }
Exactly as in the forwarding case above. A little annoying, isn't it? We need forwarders and/or combined operators en masse! But on the other side, it's still simple, annoying but simple. When you compare this to more advanced and layered, orthogonal implementation there is one clear advantage: simplicity. If I cannot compile some expression, I can extend the code without any problems. It's a simple library.

4. And the fun ends


Everything is so simple! Let's try some more easy lines:
  for_each(v.begin(), v.end(), cout << "--- " << _$1 << "\n"); 
Uh-oh... Now we don't even have an binary or ternary operator - we have an unary operator which can be chained and becomes effectively an n-ary one, where n is unlimited!

Well, how about this:
  struct XXX { int value;  bool isSeven() { return value ==7;};  };
  vector<XXX> vx(10);
  find_if(vx.begin(), vx.end(), _$1.value == 7);
  find_if(vx.begin(), vx.end(), _$1.isSeven()); 
We cannot overload the . operator in C++ as for today. But this is obviously a very useful piece of functionality, so maybe we can come up with something similar?

And then, let's us be brave and try this:
  for_each(v10.begin(), v10.end(), if(_$1 == 1, _$1 = 7));
  for_each(v10.begin(), v10.end(), if(_$1 == 7, ext_counter++)); 
i.e. to execute an action depending on the value of the current element under the iterator. And what about function currying and closures? Arrgh, problems... But they can be solved. Look out for the part 2 of this article!

PS: As you could see, there is room for improvement in the implementation, but I only wanted to get the code work.

---
* you can write this in Java:
  Collections.sort(anagrams,
                   new Comparator<List<String>>() {
                         public int compare(List<String> o1, List<String> o2) {
                           return o2.size() - o1.size();
                         }
                       });
cool, I like it! But you cannot change any variable from the outer scope, as there are no closures in Java (∀ j∈J: j <= 6)! So the ext_counter example from above wouldn't work, and besides: the Java algorithms library is no match for STL! (comments encouraged...)

** Now I think that it would be perhaps possible to add the lambda denotation, but only to discard it in the process and go on with operator overloading.

*** Speaking C++ as a Native (Multi-paradigm Programming in Standard C++): http://conferences.fnal.gov/acat2000/program/plenary/STROUSTRUP.pdf.

3 comments:

soooootjsmmmsa said...

Thats simply wonderful, thanks for the post!

Anonymous said...

I suggest that you use the term function object instead of fu[n]ctor in the line:

"Write an general fuctor which can be parametrized before it's used in an STL algorithm: I think I've seen such an implementation in the "Imperfect C++" book, but I was not really convinced. For brevity's sake I won't discuss this technique here."

for the obvious reason, and for the reason that functor has another more prevalent mathematical definition. I also suggest that you delete this post afterwards.

Marek Krj said...

Hi anonymous!

Well, the term functor is well established in (C++) programming: see http://en.wikipedia.org/wiki/Function_object. On the other hand I know some math, but no category theory, so there is no danger of misunderstandig ;-).