On Software and Languages

Thursday, 31 January 2008

C++ pocket lambda library, part 2

In the previous post in this series* I promised to tackle some more advanced topics from my pocket lambda library. To start with something, let us gratuitously choose "currying" first. Do you know what currying is? No, it has nothing to do with food, but rather with an american mathematician named, ehm..., Haskell B. Curry. In his formalization of the notion of a computable function (λ-calculus) he used (for simplicity's sake) only functions with one parameter. Come again? What about all the other functions? For this he used a trick: he defined a function which returns a new function where the first argument is fixed, and the second one remains unbound. When used recursively, we can reduce any function of n-arguments to n functions of 1 argument! Mathematics can be as cool as programming sometimes!

I bet you've seen (and used) it before! Yes, STL provides bind1st and bind2nd functions, which are doing exaclty that: currying! In other languages like Python, Perl, Ruby, Haskell, etc, it's possible too, though sometimes through a library extension. Groovy supports IMHO the most explicite (and cool) form:

    def triple_it = {x,y -> x*y }.curry(3);

1. Our task

What we want to do is to extend the STL binder such that they would work the _$1 placeholders. I think of something like the following:

    for_each(vec.begin(), vec.end(), bind(countVector, 'X', 1.222, _$1));
    for_each(vec.begin(), vec.end(), bind(countVector1, _$1, 1.222, 'Y'));
    for_each(vec.begin(), vec.end(), bind(countVector2, _$1));

    for_each(vec.begin(), vec.end(), bind(pr_sinus, _$1*2));
    const float pi = 3.14; // well, not exactly ;-)
    for_each(vec.begin(), vec.end(), -bind(sinus, _$1*(pi/180.0)) );

We assume here that the bind function is pulled in by the appropriate namespace definition, as I want avoid to have to write it like kmx::lambda::bind(sinus, kmx::lambda::_$1). In fact, I never tried out the fully qualified form, shame on me :-(((.

How can we do that? Well there's no need to reinvent the wheel - Boost library already has it, and it's even a part of the C++0x TR.1 standard library extension (tr1::bind). So we can look at the code, or even read about the techniques needed in the implementation**. So if you want, you can spare yourself this post and read ** instead!

2. Basic mechanisms

Well, here is the bind function for the 2 arguments case:

    // 2 args
    template <class F, class A1, class A2>
        binder<F, list2<A1, A2> > bind(F f, A1 a1, A2 a2)
    {
        typedef list2<A1, A2> list2_type;
        list2_type list(a1, a2);
        return binder<F, list2_type>(f, list);
    }

What is it doing? Basically it stores the arguments to be bound in a custom argument list (class list2<>). Then the function and its arguments are used to create the instance of the actual binder class:

    template <class F, class List> class binder : public lambda_expr
    {
        F m_f;
        List m_list;

     public:
        binder(F f, List list): m_f(f), m_list(list) {}

        // OPEN TODO --> only working for void funct!!!
        template <class A1> void operator() (A1 a1)
        {
            list1<A1> list(a1); // store the actual arg
            m_list(m_f, list); // forward to curried funct: m_f+m_list
        }
    };

What is it doing now? It's a functor accepting (in the H.B.Curry's true manner) only one argument. Which one? In the context of an STL algorithm it will be the current value of the iterator. So our task is to sort out to which of the arguments of a function this iterator is bound. Take our previous example: for_each(vec.begin(), vec.end(), bind(countVector1, _$1, 1.222, 'Y')). Here the iterator has to be bound to the first argument of the function - 1.222, and 'Y' to the second and third ones. The binder object stores the actual argument in the custom argument list (class list1<>) and invokes the function m_f in the context of the bound arguments m_list, remembering the current argument in the custom argument list class list1<>. All this is accomplished by heavy use of the listN<> classes, to which we turn our attention now:

    template <class A1, class A2> class list2
    {
        // args: one of them is a placeholder!
        //  -- list1 will detect it!
        A1 m_a1;
        A2 m_a2;
     public:

       list2(A1 a1, A2 a2): m_a1(a1), m_a2(a2) {};

       // apply to arguments and placeholders!
       template <class F, class List1>
           void operator() (F f, List1 list1)
           {
               f(list1[m_a1], list1[m_a2]);
           }
    };

Well, list2<> doesn't do that much, it invokes the curried function, but the detection of bound arguments is redirected to the most important class here: list1<>! Let us have a look at it:

    template <class A1> class list1
    {
        A1 m_a1;

     public:
        explicit list1(A1 a1): m_a1(a1) {}

        // if placeholder:
        A1 operator[] (placeholder<1>) const { return m_a1; }

        // not placeholders: return the ball
        template <class T>
            T operator[] (T val) const { return val; }
    };

Well, this isn't very complicated: complier will detect when a placeholder was placed and use then the current argument (i.e. the iterator). In other cases, the stored argument value will be used, i.e. the value used to curry the function f. As the value is stored in the listN<> class, it is done via a trick: the listN<> class tries to access the placeholder's _$1 value by using the index operator of the list1<> class, but in non-placeholder cases the index value is simply returned to listN<> to be used as n-th argument (a kind of rebound!).

For the sake of completeness, i.e. in case of bind(sinus, _$1) (who'd need something like that?), we need the function call operator in the list1<> class:

    template <class A1> class list1
    {
       ...
       // apply to arguments and placeholders!
       template <class F, class List1>
           void operator() (F f, List1 list1)
           {
               f(list1[m_a1]);
           }
    };

To bind 3-argument functions we need two more templates to be added: the list3<> class and the bind(F f, A1 a1, A2 a2, A3 a3) function, for 4 arguments another two, and so on, and on... But I don't think anyone should need more than 5 arguments in their functions, do you?

3. Extensions

With the above techniques we can write bind(countVector1, _$1, 1.222, 'Y') but not bind(countVector, _$1*2) or bind(sinus, _$1*(pi/180.0)). What can we do about this? Above we wondered about the bind(sinus, _$1) case, but here we can use it to our advantage. We extend the list1<> class like that:

    template <class A1> class list1
    {
       ...
        template <class T>
            T operator[] (Mul<T>& e) const { return e(m_a1); }
   };

So we can see that the list1<> class is indeed a central one! We extended its rebound mechanism to support the multiplication. Oh, uhm, that wasn't pretty! Why couldn't we simply use a general forwading here? The reason is, I left is as a TODO in code (shame on me :-((() but never came around to implement this. I tried the following:

    A1 operator[] (lambda_expr& e) const { return e(); }

but I didn't finish it, and I don't remember if it worked and if it didn't, why. It was almost one year ago and I don't remember all the details, sorry. OK, that could be done better, but (as I said in my previous post*) "it's a simple library". If you need another operation to be supported in the bind context, simply add it to the list1<>. It remains to be clarified if the simplicity of this library is an advantage, but some measurements were reported, which say that the many layers in a fully grown-up lambda library (like Boost) have a detrimental effect on the performance, because the code gets so complex that the optimizers cannot cope with it!*** And the compilation gets longer too. So maybe such a primitive implementation isn't totally in the wrong?
There is another extension I tried. Again we have to extend the central list1<> class:

    template <class A1> class list1
    {
       ...
       // bind a void member function e.g.: int (T::*)(void *)
       template <class T, typename Ret, class List1>
           Ret operator() (Ret(T::*f)(void), List1 list1)
           {
               return ((list1[m_a1])->*f)();
           }
    };

What does it do? It is supposed to be used like that:

    vector<SomeClass> vec(10);
    for_each(vec.begin(), vec.end(), bind(&SomeClass::compute(), _$1);

So for each object in a collection its (void) member function will be called. But again, I don't know if it actually worked, as there is no testcase for that in my code (but I think it should). And not everything that's possible makes sense anyway: here the code is misleading, we'd rather like to express this with _$1->compute()!

4. Last words

Well, that's everything I can say about currying support in my library. In the next post(s) I'll show you how to support lambda expressions like: cout << "---" << _$1 << "\n", and how to access a member or a member function of an object from a collection. What else? Maybe some conditional lambda expressions and some general discussion? Look out.

A general question can be answered here: what about the code of my library? Will it be publicly available? And what about the TODOS? And the missing orthogonality? The anwer is, I don't want to (and haven't got time, for that matter) polish the rough edges and add some more orthogonality. It was a fun writing the library, it took me only a couple of days (about 4, I think) , it is simple, and it has shown me that you don't have to be scared of the template metaprogramming. Of course, I wasn't a novice with templates, but I expected more resistance from C++. Maybe it's because of the library's simplicity?

Whatever. This library is not ready, is not complete, and I leave it as it is now.

---
* http://ib-krajewski.blogspot.com/2007/12/c-pocket-lambda-library.html
** Chris Gibson, "How Do Those Funky Placeholders Work?", Overload 73: http://www.accu.org/index.php/journals/1397
*** C++0x proposal: "Lambda expressions and closures for C++" : http://www.research.att.com/~bs/N1968-lambda-expressions.pdf

Friday, 25 January 2008

SOA in practice: a YAMS?

In a recent installmet of this blog I already wrote about SOA a little*, but I said that I'm going to read a book of a highly valued author to answer a couple of questions which seemed to stem from my lack of understanding of the subject. Basically my gut feeling was, that SOA (and SOAP) is just a new hype, selling old concepts in new disguise and under new acronyms. Or, in other words: why do we need yet another middleware standard (YAMS)? On the other hand I wondered: if so many highly intelligent people are buying this (i.e. the YAMS), there must be some value in it!

As not to leave my judgements to the mercy of my gut feelings I decided to read some no-nonsense SOA book. This would be close to impossible until recently, as most SOA books were full of marketing drivel and pseudo-scientific, wishy-washy definitions. But recenly a more practical SOA book was published: "SOA in Practice" by N. Josuttis**. As for now, I've read somehow 1/3 of it, which enables me to present some thoughts based based on more solid fundaments.

OK, so was my gut feeling right? Well, both yes and no! SOA is about defining basic services (which can be compared to defining the remote procedures of old), then to combine those simple services (or RPC calls) into more complete, well... services, and then building a business process flow out of those building blocks. And this using SOAP protocol and Web-Services technology. Let me say it: there's nothing new here - nihil novi sub sole! The issues coming up when designing a SOA architecture (i.e. a service-oriented one) are all familiar to me from distributed system design, and the author himself admits it readily. So I was right in my previous post? Technically, yes, with perhaps an exception of BPEL. But???

...but I must admit that SOA indeed has brought in something new, althought in a rather unexpected way! At at the first sight, SOAis only an application of sound distributed system design in corporate settings. And here it is: in corporate settings! As it seems, that wasn't so common before, the organisational and departamental issues were most important then, and now system design issues are on par with them! And that's a huge step forward. Apparently, you need a massive hype vawe to convince corporate people to use some common sense when designing systems!*** And as a Garnter analyst coined the SOA term**** and other Gartner analysts hyped it, the busines people started to hear. And this, and not the usage of SOAP protocol or Web-Services is the real breaktrough of SOA.

PS: And, of course, SOA is SOAP and WS-* in practice, just as I conjectured*, contrary to what you'd read in some high-brow SOA books.

---
* here I tried to answer the question why SOAP and Web-Services aren't simple (which is the first letter of the SOAP acronym!): http://ib-krajewski.blogspot.com/2007/11/web-services-soa-and-rrest.html
** book's homepage: http://www.soa-in-practice.com/
*** or putting it more friendly: there's a huge chasm between business and technical people!
**** SOA-definition by Gartner: http://www.gartner.com/DisplayDocument?ref=g_search&id=302868&subref=browse and http://www.gartner.com/DisplayDocument?ref=g_search&id=302869&subref=browse

Monday, 31 December 2007

Some jokes for the new year

Happy new year everybody!

What about some (programming) jokes for a smooth beginning of a new year? Let's start with an old one:

How many software developers does it take to change a lightbulb? 10 to discuss the requirements, 10 more to do the analysis, 10 more to do the design, and one to write the code, 12 years later.

Well, that's definitely a "waterfall-model" one! Do you know some XP-jokes? I don't think there are any, as Agility (with the capital A!) is much to cool nowadays to be joked about ;-)! So, as not to let you think I'm a hopeless oldtimer, here is some more modern one, actually a cartoon* :

Well, a discussion could begin about Java, its low-level interfaces, its megatons of frameworks, etc... but not this time.

And here's another one, this time more higher-order and functional (Haskell, if I'm right)**:

Isn't it cute?

And, at last, hava a look at a day in programmers life: http://www.muza.pl/7001.html?cc=0.6113369317218645&mode=image&img=0#gallery. It's really hard sometimes!!!

PS: If you know some XP or Agile jokes, please let me know!

---
* sorry, don't remember the source :-((
** source: http://arcanux.org/lambdacats.html, thanks to Philip Wadler ;-)

Saturday, 15 December 2007

C++ pocket lambda library

What I planned to do at first was to write an article for my old homepage to describe my private C++ lambda library implementation, as I wanted to jazz up my web-site (and to brag a little too...). And only because I couldn't write that article this very blog was born. So at last I'm going to do what I should have done long ago, despite of the pre-christmas madness, and the fact that I've got to find a new project middle in the holiday season!

1. The Problem

I like STL. But one thing always drives me crazy! In C++ you cannot write this:

  for_each(v.begin(), v.end(),
           new class<T> Homomorphoize { void operator()(T* t){do_it(t);}( ));

nor even that:

  class<T> Homomorphoize { void operator()(T* t){do_it(t); };
  for_each(v.begin(), v.end(), Homomorphoize() );

You must declare the doer Homomorphoize functor in the global scope, as to be able to use it in the for_each template instantiation. Yada, yada, yada? Let's see what it means in practice. One of my C++ files looked roughly like that:

  //--- helper classes: no inner classes in C++ :-((
  ...
  class Finalizer
  {
    public: void operator()(Worker* w) { ...
    ...
  };
  class Starter
  {
    void operator()(Worker* w) { ....
    ...
  };
  class NotifSrcFilter : public Worker::TestEventObj
  {
    virtual bool test(D_ev* ev) const { ....
  };
  
  // --- main
  main() {
    ...
    // and start
    for_each(m_workers.begin(), m_workers.end(), Starter());
    ...

And as I like STL rather much, there will be always a couple of such classes at the start of each of my C++ files. So what is so wrong with it? Certainly you could live with this, couldn't you? Well, I don't like the separation of code parts pertaining to a single task. Typically the functors so defined are rather short and will be used in a one single place. So IMHO it's a waste of time looking up the functor definition from the place where it is used in some STL algorithm. And it always annoys me when I'm reading the code: the long prelude which must be skipped to arrive at the real code!

2. The solution

So what could be done? There are several alternatives.

Write the code in Perl, Python, Groovy, etc: these languages have some support for lambda expressions, and many of us would readily jump to the occasion, however in commercial environment it is not always an option. And beside this, not everything would be possible in Python, for example, as it doesn't support full-blown lambda functions, only lambda expressions! In Java there are anonymous classes and they can be used to do exactly that, but they aren't fully functional closures too*.

Write an general functor which can be parametrized before it's used in an STL algorithm: I think I've seen such an implementation in the "Imperfect C++" book, but I was not really convinced. For brevity's sake I won't discuss this technique here.

Use the Boost lambda library: it's exactly what we need here! However... it's not a part of the standard C++ library, and so it's sometimes not an option in commercial environment! And it's big: did you look into the code? In includes other parts of Boost and is not very comprehensible (I didn't understand much)! We'll see later that this is inevitable as the complexity of the library increases, but I don't want to do everything what goes! There is only some subset of the general lambda functionality which I'd use in many programs.

So the result of this investigation was a decision to write small, lightweight and comprehensive lambda library for my own usage. I simply wanted to clear up my code even in a Boost-free environment!

The first thing was to get the requirements right: in this case, to decide what are the most often use cases are. I had a look at my code and found and my initial idea was to do some basic things like:

  find_if(vec.begin(), vec.end(), lambda(_$1 <= 2));
  for_each(vec.begin(), vec.end(), lambda(cout << _$1 <<  "\n"));
  sort(vec_ptr.begin(), vec_ptr.end(), lambda(*_$1 <= *_$2));
  find_if(vec_ptr.begin(), vec_ptr.end(), lambda(_$1->getValue() != 7));
  for_each(vec_ptr.begin(), vec_ptr.end(), lambda(delete _$1));

As I said, we don't want a big, complete lambda library, only a pocket edition. Here you can see the lambda function denotation, an idea which I abandoned rather soon. Frankly, I didn't know how to implement it. I knew from general descriptions of Boost lambda library, that the lambda expression should be generated by compiler via operator overloading not via a function**. So let's continue with the operator overloading idea. But how exactly should the overloading code be working?

Then Bjarne came to the rescue: he presented a following technique in as late as 2000***:

  struct Lambda { };  // placeholder type (to represent free variable)
  template<class T> struct Le // represents the need to compare using <=
  {
     // ...
     bool operator()(T t) const { return val<=t; }
  };

  template<class T> Le<T> operator<=(T t, Lambda) { return Le<T>(t); }
  
  // usage:
  Lambda x;
  find_if(b,e,7<=x); // generates find_if(b,e,Le<int>(7)));
                    // roughly: X x = Le<int>(7); for(I p = b, p!=e; x(*p++));

Do you see how simple it is? The Le<> class represents the comparison and is generated whenever a comparison is needed. How does compiler know is it needed? By the templated <= operator which is invoked as soon as compiler sees the usage of an Lambda typed variable. So the Lambda variable is just a bait for the compiler to lure it into generating our special Le<> class! This is the reason why it's referred to as a placeholder variable.

Having clarified that, we can write code to do the following:

  find_if(vec.begin(), vec.end(), _$1 <= 2);
  find_if(vec.begin(), vec.end(), _$1 == 1);

OK, tacitly we've done some design decisions: we won't use the lambda denotator, and we don't need to declare the placeholder variable, as it'll be part of the library. Moreover, we decided how the placeholder variables (yes, there will be more of them) are named. As I'm coming from Unix and shell tradition, the $1 syntax seemed more natural to me than the functional languages "_" placeholder as used in Boost. But the $1 syntax made my assembler suffer (sic!) and I decided for the middle ground: _$1!

3. The fun begins

Encouraged by the initial success we'd like to write more lambda expression using the same basic technique. Let us try to implement this:

  vector<int*> vp(10);
  find_if(vp.begin(), vp.end(), *_$1 );  // hack: means find if not 0

i.e we'd like to dereference the actual iterator. Well, this requires a somehow different approach: an operator on our placeholder variable. Not very difficult stuff. We extend our placeholder class like this:

  template <int Int> class placeholder : public lambda_expr
  {
    // ...
    public: Deref operator*() { return Deref(); }
  };

  placeholder<1> _$1;
  placeholder<2> _$2;   // etc...

So the * operation on a placeholder returns an instance of the Deref class, which looks like this:

  struct Deref : public lambda_expr
  {
      Deref() {}
      template<class T> T operator()(T* t) const { return *t; }
  };

i.e. it will, in turn, dereference its argument (the iterator) when invoked by STL via the function call operator! Simple! In the same manner we can define an Addr<> class and overload placeholder's address operator, which allows for the following code:

  // init a vector of integers
  vector<int> v10(10);
  for_each(v10.begin(), v10.end(), _$1 = 7);
  
  // construct an index vector
  vector<int*> vp(10);
  transform(v10.begin(), v10.end(), vp.begin(), &_$1);  // store ptrs to v10!!
  sort(vp.begin(), vp.end(), *_$1 <= *_$2);  // now sort the index instead of the data

Cool, we have obtained pointers to all elements of v10 and stored it away! And before we initialize the source vector elements to a value of 7! How did we do it? As the astute reader will probably know, we defined an Assgn<> class, which will be returned by the overloaded assign operator of the placeholder<> class. The sort using dereferenced comparison is a piece of cake for us! Or is it? Well, here we have placeholders on both sides of the comparison. In math-speak we no more need an unary operator on placeholders (like: *, & or ++), wee need a binary one! So let's define it:

  struct Le2 : public lambda_expr
  {
      Le2() { }
      template<class T> bool operator()(T a, T b) const { return a <= b; }
  };

  Le2 operator<=(placeholder<1>, placeholder<2>) { return Le2(); }

Not really different from what we've had before, is it? But wait, didn't we forget something? What about dereferencing the iterators before comparison? Well, we need one layer more for that:

  template <class S, class T> struct Le2_forw : public lambda_expr
  {
    S e1;
    T e2;

    Le2_forw(S s, T t) : e1(s), e2(t) { }
    template <class U> bool operator()(U a, U b) const { return e1(a) <= e2(a); }
    // ...
  };

  template <class S, class T>
    Le2_forw<S, T> operator<=(S s, T t) { return Le2_forw<S, T>(s, t); }

Now we are using the forwarder Le2_forw<> to sore any expression used on both sides of the <= comparison, remember them an then invoke them when the final comparison is done. Uff! OK, but this one will work without forwarding layers!

  find_if(vp.begin(), vp.end(), *_$1 == 1);

Or at least it should: we defined the operators for comparing placeholders to a value and dereferencing them! Well not, in the simple form as they are coded, they cannot be combined! We need to write an extra class to handle this combination:

  template<class T> struct EqDeref : public lambda_expr
  {
    T val;
    EqDeref(T t) : val(t) { }
    bool operator()(T* t) const { return val==*t; }
  };

  template<<lass T> EqDeref<T> operator==(T t, Deref) { return EqDeref<T>(t); }

Exactly as in the forwarding case above. A little annoying, isn't it? We need forwarders and/or combined operators en masse! But on the other side, it's still simple, annoying but simple. When you compare this to more advanced and layered, orthogonal implementation there is one clear advantage: simplicity. If I cannot compile some expression, I can extend the code without any problems. It's a simple library.

4. And the fun ends

Everything is so simple! Let's try some more easy lines:

  for_each(v.begin(), v.end(), cout << "--- " << _$1 << "\n");

Uh-oh... Now we don't even have an binary or ternary operator - we have an unary operator which can be chained and becomes effectively an n-ary one, where n is unlimited!

Well, how about this:

  struct XXX { int value;  bool isSeven() { return value ==7;};  };
  vector<XXX> vx(10);
  find_if(vx.begin(), vx.end(), _$1.value == 7);
  find_if(vx.begin(), vx.end(), _$1.isSeven());

We cannot overload the . operator in C++ as for today. But this is obviously a very useful piece of functionality, so maybe we can come up with something similar?

And then, let's us be brave and try this:

  for_each(v10.begin(), v10.end(), if(_$1 == 1, _$1 = 7));
  for_each(v10.begin(), v10.end(), if(_$1 == 7, ext_counter++));

i.e. to execute an action depending on the value of the current element under the iterator. And what about function currying and closures? Arrgh, problems... But they can be solved. Look out for the part 2 of this article!

PS: As you could see, there is room for improvement in the implementation, but I only wanted to get the code work.

---
* you can write this in Java:

  Collections.sort(anagrams,
                   new Comparator<List<String>>() {
                         public int compare(List<String> o1, List<String> o2) {
                           return o2.size() - o1.size();
                         }
                       });

cool, I like it! But you cannot change any variable from the outer scope, as there are no closures in Java (∀ j∈J: j <= 6)! So the ext_counter example from above wouldn't work, and besides: the Java algorithms library is no match for STL! (comments encouraged...)

** Now I think that it would be perhaps possible to add the lambda denotation, but only to discard it in the process and go on with operator overloading.

*** Speaking C++ as a Native (Multi-paradigm Programming in Standard C++): http://conferences.fnal.gov/acat2000/program/plenary/STROUSTRUP.pdf.

Sunday, 4 November 2007

Web Services, SOA, and the (r)REST

OK, it's time to talk about SOAP and web-services (and many other things ;-). Why now? Well, I want to write about it before I start reading Nicolai Josuttis' book about SOA, just to record my current (and possibly superficious) impressions.

Beacause Nicolai is known to me as author of fine C++ books, I expect rather much from this one, and suspect that its lecture will shutter some of my safe and secure beliefs. If thad be so, I'l write a new blog to restore the honour of SOAP. But now, ad rem! Down to business!

I remember as a couple of years ago I was receiving one email after another: new SOAP-based protocol A invented, new SOAP-based protocol B approved, new SOAP-based protocol C proposed. And I was wondering - why are the people inventing the wheel for the n-th time afte RPC, DCE and CORBA? Why so much protocols? One is daunted by the sheer number of needed specs, regulative bodies, and documents. Wasn't is supposed to be about simplicity? It's Web! Here I'll cite my first source - "Dave Podnar’s 5 Stages of Dealing with Web Services"*:

Denial - It's Simple Object Access Protocol, right?
Over Involvement - OK, I'll read the SOAP, WSDL, WS-I BP, JAX-RPC, SAAJ, JAX-P,... specs. next,I'll check the Wiki and finally follow an example showing service and client sides.
Anger - I can't believe those #$%&*@s made it so difficult!
Guilt - Everyone is using Web Services, it must be me, I must be missing something.
Acceptance - It is what it is, Web Services aren't simple or easy.

So, at it seems it was about simplicity in the beginnings,and then it got more and more complicated. Why? As a SOAP book author** said: "Web Services are hard because they are a form of distributed computing and distributed computing is just about the hardest problem in computer science"!

Well, of course it's about distributed computing, but let me make my point here: web-services are got so complicated because they are too low-level a model of distributed computing! SOAP is basically RPC done with XML, and PRC deals with location-transparent procedure calls. This paradigm never had a lot of success - look at the demise of IIOP and RMI - because it's too fine-grained. Compare this remote call model to other model of distrinuted computing - Google's Map-Reduce system (to my mind the most successful distributed computing system today)***. It works at totally different level: it's about gathering and transforming information in a distributed manner, hiding away details like remote calls and node failures!

Now enter REST (representational state transfer)****, one another model for distributed computing. Let's explain it to those of you doing embedded programming or databases: REST is the principle the Internet is acting upon. There is a set of ressources which we can acces using URL's, i.e. only knowing their location. So, instead of using UDDI+WSDL+SOAP we just send:

an HTTP GET request for an URI like: www.marek-here.de/mareks_blog?title=first%20blog
or, as Ruby on Rails uses it: : www.marek-here.de/mareks_blog/1
or a POST with: www.marek-here.de/mareks_blog?&title=new%20post&text=bla%20bla%20bla

and we'll get the data (usually XML) in response, or we'll create a new blog entry. Simple? Yes, it is! Compare it to the 5 stages of web services!

Well, one must admit, there's a lot of public SOAP interfaces in the Web at the moment: look at Google's and Yahoo's maps, Amazon's bookstore and Ebay auctions. But these interfaces are increasingly perceived as old-fashioned, because RESTful services scale better and are simpler to use. For example Google discontinued its SOAP-based search API and is rumored to be striving to get away from SOAP-based APIs entirely. As for Yahoo and Amazon, which offer SOAP and REST based API in parallel, 85% of their users allegedly choose REST.

It is perhaps a good place now for some well-known citation. It has aleady made its round over the Web, but I'll use it here nevertheless, as it is simultaneously fun and instructive*****:

"WS-* is North Korea and REST is South Korea. While REST will go on to become an economic powerhouse with steadily increasing standards of living for all its citizens, WS-* is doomed to sixty years of starvation, poverty, tyranny, and defections until it eventually collapses from its own fundamental inadequacies and is absorbed into the more sensible policies of its neighbor to the South.

The analogy isn’t as silly as it sounds either. North Korean/Soviet style “communism” fails because it believes massive central planning works better than the individual decisions of millions of economic actors. WS-* fails because it believes massive central planning works better than the individual decisions of millions of web sites."

Let me close this post with a little story. At the BEA's Arch2Arch Summit in Frankfurt, Germany, this summer I attended the panel discussion about SOA with some of high-level BEA people. BEA is a middleware company, Java-centric and very much into web-services. Of of the questions at the panel discussion was the simple and succint: "Why do you think SOA is so good?". The answer of the president of the German subsidiary was: "Because it's about customer orientation!". The answer of BEA's chief architect was: "Because there's no programming language binding (as in CORBA) - the protocol is language-neutral!".
After hearing that I asked myself: is that something really new? Isn't industry repeating itself? It was the same feeling I described at the start of this post, when I heard about new SOAP protocols: are the people trying to solve something, or to generate more buzz and revenues? But my gut feeling may be wrong - admittedly SOAP is too low level as an abstraction, but SOA is building a layer on top of it. On the other hand SOA is normally implemented using SOAP. So SOA=SOAP=WS* in the end? And why is corporate world loving it? Is it doomed to failure like North Corea? Could we do SOA with RESTful services?

PS: I updated my August-07 "iPhone Presentation" post. After I've read Nicolai's book, I'll probably update this one too.

---
* see http://rmh.blogs.com/weblog/2006/04/dave_podnars_5_.html and discussion http://www.theserverside.com/news/thread.tss?thread_id=40064
** books homepage (http://soabook.com/), and excerpt http://soabook.com/blog/?p=5
*** for example see the presentation: "Using mapreduce on Large Geographic Datasets (Barry Brumitt, Google)", http://video.google.com/videoplay?docid=6202268628085731280 "
**** defined in Roy Fielding's dissertation: "Architectural Styles and the Design of Network based Software Architectures, 2000", www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
***** see here for the post and discussion: http://cafe.elharo.com/xml/north-and-south/

Tuesday, 30 October 2007

What can you learn from the bad guys?

Really, is there something you can learn from them? Confucius said some time ago: "I'm always glad to meet another voyager: if it's a pleasant man, I can learn what I can do better, if it's an unpleasant man, I can learn what I shouldn't do!". But can we learn something in apositive sense? In order to be able to answer this, let us return to the realm of IT.

What do the really bad IT-guys do? They break in into systems! They use a host of techniques, and one which is particularly interesting is the "SQL injection" (I personally prefer the more colourful - if slightly incorrect - name "SQL hijacking"). Let's explain it using an SQL example. If we build an SQL query string like this:

query = "SELECT * FROM users WHERE name = ' " + userName + ' ";

Now, if we supply an user name string like, say "dummy' or name not null --", the resulting SQL query looks like*:

SELECT * FROM users WHERE name = 'dummy' or name not null --'

And this will get you the list of all users in the system! If the submitted username is like "xxx'; DROP TABLE users; --" the attacker can even get nasty on the database! Simple but effective!

So what, you'd say, how many bad guys will have direct access to my SQL database? Well, you'd be surprised! If you have a Web application which accesses an SQL database the following scenario is conceiveable. The bad guy invokes your SQL database indirectly requesting an URL in the form:

your.site.de/your_app/ListUsersAction.do?ID=11;UPDATE+USER+SET+TYPE="admin"+WHERE+ID=23

which, when appended to the "SELECT * FROM users WHERE id=" string will grant the attacker the admin privileges! Or, when he requests the URL in the form:

your.site.de/your_app/ListUsersAction.do?ID=11;GO+EXEC+cmdshell('format+C')

the disaster is looming! I don't want to discuss this in detail, as it's not a hack instruction page, but I hope you've got the general idea: this a real problem for the Web application development. Well, that's all very bad behaviour and all, but the question is: why should we learn anything about this?

Recently I was charged with extension of a J2EE web application for one of my clients.They were using a proprietary custom tag JSP library**, whose sources were not available (unfortunately, in a big company that's not so unusual). The custom tag library was rather nice, but as the code was gone, it was not extensible. For example, I couldn't specify the onClick attribute for the custom button tag:

<somelib:button action="someapp/SomeAction.do" image="..." ... ⁄>

Can you see it? Think, think... Yes! Let us try some JSP/Javascript injection:

<somelib:button action="someapp/SomeAction.do \" onClick=\"alert('Gotcha!'); \" " image="..." ... ⁄>

which produces, when iterpreted to HTML, a wonderful:

... url="someapp/SomeAction.do" onClick="alert('Gotcha!')" image="..."

The reason why this is working*** ist the fact that \" isn't interpreted as the end of string sign, as it is escaped with a "\" ! So we have a kind of double bottom here - on one level we have a regular string, which, when interpreted, is transformed into something different sematically - it's the second bottom. Isn't it beautiful? On one level (tags) we have a senseless string, which on the lower level (HTML) gets injected into a page code and starts living its own life! Like a seed coming to life after a hibernation phase. Since then I used this technique in several other places, and it never ceases to amaze me. Look at that: <somelib:help key="start_task" tooltip="Help \" align=\"right\" "/>.

So now we can answer the title question: from the bad guys, we can learn how to extend a non-extensible JSP custom tag library! And yes, Confucius was right: we can learn from every person. Indeed.

---
* here "--" comments out the last apostrofe, but we might as well add some ineffective condition like "and name='yyy" to balance the apostrophes.
** for those of you not doing J2EE: a custom tag is a sort of macro generating HTML code when interpreded on the server side
*** this works on Tomcat, and as Tomcat is the reference implementation of a Servlet container... guess what.

Saturday, 20 October 2007

Some plans, or what's to be expected

...from this blog? As this blog is stalling a bit recently (I began several articles in parallel and they are each in a different phase of completion) I felt like it would be a good idea to reinstate the purpose of this blog. For me: to motivate myself to complete at last some of the most promising articles, and for my readers (yes, don't laugh, I know that nread >= 1!): to get some perspective if it'd worth reading this blog for the next time.

When I started, I had two purposes on my mind:

1. to write about some of my technical work which I found interesting
2. to give some interesting vistas on some general topics in SW developement - provided I can find some ;-)

So you can see this blog wasn't intended as uncoordinated blatherings, but rather a limited collection of articles: I simply haven't done that many interesting things or found that many new ideas. This can be explained by the fact that this blog is a sort of replacement for my old, nonfunctional, dilapidated homepage, which I really should update sometimes soon. But these days starting a blog is much simpler than re-working an old homepage (it's Web 2.0 now!). So let's extemporize a list of topics I've got in the pipeline:

1. Technical topics

design and implementation of my own small lambda expression library in C++ (as it was fun)
modal logic and tesing of multithreaded application (as it shows the limits of knowledge)
using lock-free synchronization (as C++ programmers always worry about performance, maybe to much?)
subtle Java vs C++ differences (as I like language-design discusssions)

2. General SW-engineering topics

why is everybody keen on scripting languages (sociology applied ;-)
Python's language design (coming to terms with Python hype)
waterfall and XP methods (a somewhat different comparision)
web services, SOAP an the (r)REST
does speed really kill in SW-engineering?
the new antipattern (some ruminations on singletons)
C++ myths in the Java community (as most of the Java people know C++ only from fairy tales, I think)
from DLL to XML hell (sometimes it must be a rant)
Idiot's sorts (I mean sort algorithms...)

etc...

OK, let's get real. I'd be happy to complete the blogs on the three first topics from the list 1. and the four first topics from the list 2. In addition I'll sometimes do a "follow up" on some earlier blog entry. The one I must definitely write is a follow-up on the "Google Solvers"* blog: there is defintely something that has to be added to that one! Sometimes I'll simply extend some older article if it isn't too long, for example the "iPhone presentation"** blog: at the time of its writing I purposely omitted examining any of my guesses, now it's time to compare my intuitions with reality. And there are some other topics I've forgotten, but I may remember again sometime. As my distinguished colleague Stefan Z.* would say "Software engineering is like war, ...", so I'll keep on fighting.

----
* http://ib-krajewski.blogspot.com/2007/07/google-solvers.html
** http://ib-krajewski.blogspot.com/2007/08/iphone-presentation-with-afterthoughts.html