On Software and Languages: 2009

Monday, 21 December 2009

Local language trends 2009

I just had a look on the statistics pages on our local IT freelancer website (http://www.freelancermap.de/) and found the results pretty interesting. First let's have a look at the languages which are most popular in the freelance projects. The picture below shows the languages, the supply of programmers (the blue bar) and the demand in projects (the red bar).

As it seems, here in Germany there are only two serious languages: C/C++ and Java. Well, ok, Visual Basic and C# are sough after rather strongly (red bars), but there seems to be too little supply. Is that the fear of a vendor lock-in or is it just because Microsoft is considered evil that the programmers don't like .NET? The same case of too little supply for PHP, but here I must mention that the hourly rates for PHP tend to be rather low - PHP still considered a hacker language?

And now look at the last but one language with a longwinded German name I won't retype here (too lazy). In fact it's not a language of its own, but rather denotes "all the other ones in there", which includes everything fancy (or just plain old). You see there is too much supply, the IT managers seem to be more conservative than programmers, no surprise here.

Which leaves us with C/C++ and Java. Java is unmistakeably the 500 pound gorilla here, but it's hopelessly over-supplied (blue bar)! For C/C++ the situation isn't so dramatic, and I'm glad I specialize in C++!

The proposed course of action here: go away from Java (or well, from everything else for that matter) and start on .NET!

Now the overview of technologies in the freelancer projects:

Software development comes first (still a little place for more work - red bar!), then system management (plus "service and support"), however the suply here is much too big (blue bar)! It's just as bad as for SAP programming - in the past the eternal cash cow with ridiculously high hourly wages, now it's a real "problem child" - the market seems to be breaking down. Banking crisis anyone? For Web development and "graphics, content and media" there's much too much demand, just like in the PHP case above. Then the ominous field of "IT consulting", where I don't have any idea what it is, sorry, no comments there :-/. One interesting field is "Executive consulting" with so much demand and so little supply, maybe everyone is just too shy to position theirself in that niche?

The proposed course of action here: either stay with software development or move into "Executive consulting"!

Having said that: Merry Christmas you all!

Wednesday, 16 December 2009

Even more new C++ features!

Hi everybody! As I had a look on the new C++ proposal the last time , I found many small interesting things apart form the big, important features. But this was only a first look. Recently I had another look there (and at the C++0x FAQ* by Bjarne Stroustrup as well) and discovered even more interesting crittures. So what did I miss?

1. Some Syntax

First of all the new function definition syntax! Now I can write something like*:

  using PF [](double) -> void;

along the lines of lambda function definition! Well, that's definitely a treat, what a relief from

  typedef void (*PF) (double);

To be honest, I don't know if I've got the last one right, I have always to look thes syntax up or copy and paste from my other code! The [] notation can be generally used to say to the compiler that "the function type will be specified later"! Thusly we can write the following when usign embedded type like List::Link

  [] List::erase(Link* l) -> Link* { ... } ;

We postpone the return type definition utill we have entered the scope of List, and thus no scope specifier is needed!

I liked this notation (it makes the code look a little Groovy-like), but alas, it seems that not everyone does so, as the related "auto arrow" notation proposal was rejected recently**! What is auto-arrow? Look at that***:

  auto f() -> int;
  typedef auto PF() -> int;

the first one is a function from void to int, and the second one is a function type definition for the same signature. Looks good enough, but consider this one:

  auto *pf(auto *p() -> int) -> auto *()->int;

I don't even want to discuss what it means, if you are curious look it up in ***! So after that proposal was justly rejected, I'm a little afraid of the fate of the [] notation - is it the next one to be scrutinized? I hope not!

2. Initalization and initializers

I though I've understood the new initalizer notation, but I was surprized how ubiquitous it became! At first I though it would be only used in assignment like

  vector<int> xs = {1, 2, 3 };

Not so, nowadays you can use it (i.e. the new {} notation) everywhere, even instead of classic parentheses!

  X x{a};
  f({a}); // initializes a temporary
  X* p = new X{a};

This all does look pretty strange to me, and to be honest I don't like it that much, but on the other side you can use it to avoid verbosity with pretty good results:

  map1.insert({a, b}); // sic!

The other improvement with initialization is that we can initialize the string and integer class members directly in the definition of the class!

  class X
  {
    int a = 7;
    string s{"abcd"}; // new syntax!
  };

I can only say AT LAST and amen!!!

Belonging to the "initialization" block of features, we must mention the access to the inherited constructors (in addition to the also new feature of delegating constructors, i.e. constructors which can be invoked of another constructors of the same class):

  class A { public: A(){} };
  class B : public A
  {
    using A::A; // imports all the A's constructors into current scope!
  };

  // now this is possible:
  B b;

At this place I'd like to mention another related feature: disallowing copy and assignment operators. You can write for example:

  X& operator= (const& X) = delete;

to disallow it! Well, if I consider that Google even has a macro exaclty for that reason (DISALLOW_COPY_AND_ASSIGN(), see here) and that I used an UML tool to generate the private version of it for my every class I think that the default-generating them was a bad language design idea! If we want a copy semantics we'd better tell the compiler so! Hindsight is 20/20.

3. Threading etc.

Of course thread, mutex, condition variables and atomics are there as basic support (as expected). On the higher level we have futures and a packaged_task class, which wraps features for simpler usage. Moreover, Bjarne does promise that an async task launcher (logically named async) will be included in C++0x as well. Then we could launch tasks simply like that:

  auto f1{async(funcX(100))};
  auto f2{async(funcX(200))};

  return f1.get() + f2.get(); // waits on both futures

If he's right it would be very nice!

As GC is connected to multithreading in an intimate way (at least in my eyes, sholud blog about it some time soon!), I must state here that the GC is still missing!

But not entirely, we've got some minimal support for third party garbage collectors in there (20.8.13.7 Pointer safety). It defines what a "safely drived" (i.e reacheable) or an "disgused" pointer is (3.7.4.3 Safely-derived pointers) and lets the programmer specify where such pointers can/cannot be found with declare_reachable()/declare_no_pointers() functions.

One thing that struck me however in the threading library classes is the flexibility of interfaces. Take the std::thread's constructor for example. You can use it like that ****:

  void hello_func() { ... }
  // say hello
  std::thread t{hello_func};

or like that, using a functor:

  class Greeting
  {
  public:
    explicit Greeting(std::string const& message) { ... }
    void operator()() const  { ... }
    void greet(std::string const& message) const  { ... }
  };
  // say goodbye
  std::thread t(Greeting("goodbye"));

or currying the thread function with std::bind:

  void greeting_func(std::string const& message) { ... }
  std::thread t(std::bind(greeting_func, "hi!"));

or without std::bind:

  std::thread t(greeting_func, "hi!");

or without std::bind and with more arguments:

  void write_sum(int x, int y){ ... }
  std::thread t(write_sum, 123, 456);

or using the member function pointer

  Greeting x;
  std::thread t(&Greeting::greet, &x, "greet");

  // or by a smart pointer
  std::shared_ptr<Greeting> ptr(new Greeting);
  std::thread t1(&Greeting::greet, ptr, "hello");

  // or by reference
  std::thread t2(std::ref(x));

  // or directly!
  std::thread t2(x);

See, you don't need to worry about the initalization, the new constructs will just take anything you throw at it!

4. Miscellanea

First the variadic templates. What a name for templates with viariable number of arguments! I guess it's an interpolation of the "dyadic" concept which just means "of two arguments" :-). But seriously, we can writhe things like

  template<typename Head, typename... Tail>

doing recursion and typematching like in Clojure or Haskell ;-) Well a kind of, but it looks like that at least. Look at the n-tuple definition using variadic templates in Bjarne's FAQ*.

Further, the new and improved enums (which are class based now) can be now forward defined ! But I'm doing this all the time with C++ Visual Studio 2005! Ouch...

No for the geeky ones: we have user defined literals! You can define new literal types using the following new operator type: operator""() (plus constexpr!), for example

  "abc"s; // s: string literal
  2i;     // i: imaginary literal

For details have a look at Bjarne's FAQ*. NIce feature, but will I use it? It is readable for the maintainer? We'll see.

The next 2 features look to me as if they came straight from Python. First the raw strings:

  string s = R"[ahjh/?=(%/$§%§%\w\\w///]";

Why not just R"..."? Dunno. Next the foreach variant of the old for() loop.Together with initilizers we can now have some Python feeling in C++:

  vector<string> xs = {"aba", "bab", "aab" };
  for(auto x:xs) cout << x; // x:xs looks even a little Haskell-like ;-))

So there are many new things, some of them making C++ look like entirely another language. No I wonder - will the C++ programmers use it, or will they stuck to the old, but known evil (like the old function definition syntax)?

---

*Bjarne's "C++0x - the next ISO C++ standard", http://www2.research.att.com/~bs/C++0xFAQ.html

** Danny Kalev: "The Rejection of the Unified Function Syntax Proposal", http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=460

*** Daveed Vandevoorde: "The Syntax of auto Declarations", http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2337.pdf

**** for completness' sake, the examples are partially taken from: http://www.justsoftwaresolutions.co.uk/threading/multithreading-in-c++0x-part-1-starting-threads.html

Tuesday, 24 November 2009

More about Java Closures

Surprize, surprize! Hear, hear!

One year ago, Mark Reinhold, Principal Engineer at Sun Microsystems, announced At the Devoxx conference in Antwerp, Belgium that the next major release of Java, JDK 7, would not include closures. At the same conference this year, however, Reinhold announced in a surprise turn around the Java would be getting closures after all in JDK 7.*

More specifically, they adopt the BGGA proposal (look here) but without the non-local return (look here) and control invocation statements. This leaves Java without any destructor or C# style using statements support!

What can I say? I can only reinstate what I said in my previous post: comparing to the development of the C# language (look here ** for an interesting inteview about LINQ and its supporting language features) the Java language looks increasingly mouldy (ouch!!!).

But maybe I'm mistaken? maybe there will be some support for modern ressource management in Java but it's decoupled from the closure proposal? Let's hope so.

Update: well, I didn't hope in vain, there will be a feature for this: Automatic Ressource Management (see here)!!! Moreover, we'll got a little bit of type inference for the right hand generic expressions with new (as Google Collections provides it) with a "diamond" operator (?) <>, and strings in the switch statement (as in Groovy). All in all you cannot complain, I'd say, and some bloggers (ehm...) shouldn't be too quick to bash Java 7!

--
* I found the news here: http://www.artima.com/forums/flat.jsp?forum=276&thread=274496
** http://www.infoq.com/interviews/LINQ-Erik-Meijer, BTW I found an interesting quote of general interest there:

Actually, if you look at where I'm going, I'm getting more and more into old school object orientation with interfaces and so on, going away from functional programming. If you are passing a high order function you are passing one thing, one closure. If you're passing an interface or a class, you are passing many because you are passing a whole V table. These can all be recursive and so on. I'm rediscovering object oriented programming and I feel that maybe I wasted some of my time doing functional programming and missed out on objects.

You see, it's not only the functional programminch which is gonna be the Big Next Thing! I think that we cannot just condemn OO and go all for the functional. OO has its merits when structuring code, and a hybrid OO/functional approach seems to be a good thing to me!

Wednesday, 11 November 2009

Songs in code

Won't explain what they are, because it's obvious. These two made me laugh:

Bob Marley's (and Groovy's):

  4.times{
  !woman == !cry
  }

and REM's (and SQL's):

  DROP TABLE MyReligion

but normally they are boring, uninspired or plainly wrong, like this one:

  while(young)
  {} // loops forever

Come on, the whole thing is that young is a volatile variable, which will somehow, sometimes be set to false! ;-)

Friday, 30 October 2009

Java's Closures Debate for C++ Eyes

Years ago Sun described Microsoft's delegate extension proposal for Java as:

Many of the advantages claimed for Visual J++ delegates -- type safety, object orientation, and ease of component interconnection -- are simply consequences of the security or flexibility of the Java object model.
....
Bound method references are simply unnecessary. They are not part of the Java programming language, and are thus not accepted by compliant compilers. Moreover, they detract from the simplicity and unity of the Java language.*

blah, blah, blah, marketing drivel! I don't know what is more annoying here: 1. the object-oriented orthodoxy, or 2. the complacency that "we know it all better! Be that as it may, but now James Gosling (Mr Java himself) says he'd always wanted closures instead of anonymous inner classes in Java, and there is not one, but whole 3 (!!!) proposals how to include functions as first class citiciens of the Java language! So much for the "simply unnessary" argument...

And there is considerable controversy around the very idea of closures, somehow similiar to the C++ debate around Concepts we discuseed recently, deeming it unnecessary and "too complicated". As we are lucky to get lambda functions in C++0x and don't complain the least, the debate in Java community made me curious. Come on, why not to have closures? Even C# has some!

So what is a closure? For the sake of this blog entry it is something like a lambda function in C++ (or Python or Groovy), i.e. an anonymous, freestanding function:

  auto print = [] (int x) { cout << x; }

Now let's do it in Java.

1. Proposals

So let's have a look at the three proposals.**

a) BGGA (Bracha, Gafter, Gosling, Ahé)

This is the most ambitious proposal. It introduces a new type into the language: a function type, and a totally new notation for it. This notation remainds me of Haskell's or of the lambda calculus types.

{T => U} for a function from T to U
{T => U => V} (or is it {T,U => V}?) for a function from T and U to V

I think this notation is one of the purposes why many in the Javaland doesn't like this proposal a bit. The second one will be surely the nonlocal control statements (returns, breaks etc). Come again, non-what? Well, it means that for example the return statement doesn't return from the closure but from the scope which invoked the closure! They don't bind to the closure but to the environment! Why should that be good for? Wait till next section for expanation, dear reader.

An usage example:

  {Integer => void} print = { Integer x => System.out.println(x); }

b) CICE (Concise Instance Creation Expressions)

This is the simplest of the three proposals. Basically it's only a syntactic sugar for defining anonymous classes on the run, nothing more. No first class functions! No obscure functional programming theory!

An usage example:

  public interface IPrint { public void invoke(Integer x);} // exacltly 1 method!
  IPrint print = IPrint(Integer x) { System.out.println(x); }

It's the simplest of the proposals, but for my mind, the most annoying one: you have always to refer back to some interface definition, and it sucks!

c) FCM (First Class Methods)

This is a kind of middle ground between the previous two: it introduces first class functions, but doesen't have the nonlocal binding for closures return statement and the irritatinmg lambda-type notation.

An usage example:

  #(void(Integer)) print = #(Integer x) { System.out.println(x); }

From all the three syntactically I like the BGGA proposal the best, as you can just write your code in a familiar code-block manner, without the FCM's hash or CICE's interface tag.

2. The Debate

To put it mildly, sometimes I find the debate somehow childish. Here the typical arguments:***

pro: If the don't implement closures, I'll never code any line of Java an will go to Scala
con: If they implement the closures, I'll just break down and cry... I cannot change to any other language, but I'll hate you all for that!

Come on people! It's just a language feature! And you don't have to use it at all if you don't like it. Beside this, it's not the developers that choose languages, it's the managers! If they can be convinced that a language offers some substantial benefits, it will be adopted. But not because you like Scala or something else better!

Will the closures make Java too complicated? Would they really

"...complicate the language beyond normal usability: mainstream programmers will turn away from Java, so the argument goes, and move on to simpler programming languages
...
Java will turn into a guru language used only by experts in niche areas"**?

So tell me which the simpler languages for the JVM are! I can see none (nonscripting one) which would be simpler. Guru language? Are you kidding me? The closures are an attempt to make Java less verbose, so everyone should be happy. So why the fear?

One (non PC) answer might be, that the "too complicated" camp is perfectly happy with the language as it is now, doesn't mind the lack of elegance, and don't want learn new mechanisms. And maybe there is some point where there are to much mechanisms (or paradigms) in the language? Me, as a C++ guy, I'm accustomed to the "multiparadigm design", as C++ is a real behemot in that respect. I just try not to think about the mechanisms I'm not using now.

Maybe in Javaland this isn't so simple, as the programmers were lectured for years that there is only one paradigm in the world: the object oriented one. This would explain why some people think that the addition of generics stressed the complexity of language to the limits (the other explanation is the design choices for backward compatibility, in particular the wildcards).
So it seems to be either the the irritation caused the generics or the unusual nonlocal returns in BGGA which provoked the strong feelings, dont you think?

3. Discussion

But let's get more technical: why do we need closures in Java? The answer is probably: because Ruby has got it! The Java community seems still to be in a collective Ruby trauma, and emulating Ruby features is a good enough reason in itself. As they used to say behind the Iron Curtain: to learn form Ruby is to learn how to win! So now, just as in Ruby (or Groovy and C# for that count) we'll be able to write in Java:

  Utils.forEach(names, { Integer x => System.out.println(x); }); // BGGA

So, that's kind what we've always had in C++ with STL and lambda libraries.

But that's not all there is! In each of the three above proposals there is a second part, which allows some kind of finer-grained resource management, i.e. a kind of "destructors for the modern times". I can only say "finally", because Java's resource management is just a big pain in the rear.

BGGA tries to achieve this goal by nonlocal control statements mentioned above, which will then allow to implement the "execute around" pattern as so called "control abstractions"** , but are somehow not very intuitive. In the BGGA proposal the keyword this will be lexically bound and will automatically refer to an instance of the enclosing type in which the closure is defined. For me, Groovy solves the problem of nonlocal bindings somehow clearer - introducing two members for a closure: owner (refering to the enclosing object) and delegate (defining the lexical binding). In tis way wee can always explicitely say what type of lexical binding we need, the default value of delegate being owner!

The other proposals try to address the same question as well: CICE by ARM (Automatic Resource Management Blocks - a special type of block where a dispose method will be automatically applied when leaving it) and FCM by JCA (Java Control Abstractions - this one is more like the "execute around" pattern mentioned above, and requires nonlocal lexical bindings as BGGA does).

At this point let us try to draw some conclusions.

Firstly: with C# having the lambdas in its 3.0 version****, and C++0x having lambdas and the auto specifier, leaving closures out of Java 7 would let Java looking pretty old in comparison!

But don't despair, there is a library solution out there (http://code.google.com/p/lambdaj/). Maybe a library solution will be suffiecient in this case? I don't know, the syntax doesn't look very intuitive, but it's probably the best you can get.

Secondly: what I find the most instructive lesson here is however, that we can see that the old notion of destructor isn't so old-fashioned after all! C# has meanwhile its using statement and Java deserves (and needs) some destructor-like mechanism as well! So much for the (former) "modern languages".

--
* form SUN's rebuttal of delegates - http://java.sun.com/docs/white/delegates.html, where they claim that delegates (i.e. C# equivalent of C++ member function pointers) are bad, because the anonymous classes can do all the same ans are object oriented too! A typical case ob the object-craze of the nineties IMHO

** I will only skim the surface here, as my aim is only the look-and-feel. If you need more information a good place to start is: http://www.javaworld.com/javaworld/jw-06-2008/jw-06-closures.html ("Understanding the closures debate - Does Java need closures? Three proposals compared", by Klaus Kreft and Angelika Langer, JavaWorld.com, 06/17/08)

*** rephrased ;-U from the discussion here - http://infinitescale.blogspot.com/2007/12/bgga-closures-end-of-many-java-careers.html, but seen in that form on other places in the Web

**** Let us digress a little: in C# we'd define the above lambda like that:

  Function<int, void> p = name => Console.WriteLine(name);

or use it directly like that:

  names.FindAll(name => name != "Marek")
       .ForEach(name => Console.WriteLine(name));

Note that in C++ the lambdas aren't templates (they are monomorphic, i.e. we must define the argument types) but in C# they are! To achieve the same in C++ you have to use a template lambda library.

I mean the C# language is looking increasingly cool, perhaps I should learn it, as everybody can program Java these days???

Thursday, 27 August 2009

Tony Hoare and the meaning of life (well, almost...)

You know, I was always wondering about programming - is it an art, a craft, or is it an engineering discipline? Some crazy hackers maintain it be an art, more down-to-earth types oscillate between craft and engineering.

My personal feeling was that it couldn't be engineering - I missed the scientific part of it, it was too "soft". For me software engineering appeared rather to be a set of rules of thumb for organizing our mental work (functions, classes, modules...), something from the realm of cognitive science perhaps :-). For example I couldn't take my multithreading program and prove it not to deadlock in a future (i.e not yet written, but planned) installment of this blog...

1. The presentation

And then, some months ago, I came across that presentation* given on QCon 2009 Conference by Sir C.A.R. Hoare (aka Tony Hoare of the Hoare logic, Quicksort, and CSP fame) about relationships between programming and computer science.

Computer Science is about how programs run and how correct programs can be constructed *

By the way, I used to dislike the all-present video presentations, which seem to be replacing the old-fashioned articles as programmer's information source of choice. Well, to put it frankly, I hated them! Instead of beeing able to quickly scoop the essentials and then read the article if it'd interest me, I'm now forced to hear for hours to some uninspiring presentations and often to some really horrible accents to boot, as to discover that in the end only the title of the video was interesting!

But on the other side, with the video presentations I'm no able to hear people like Linus Thorvald** and Tony Hoare in person, and it proved to be very rewarding to me in both cases. Watching Tony Hoare was a great pleasure - he's a wonderfully gentle and good humored old man, I'd say my idol of sorts. And as he's got his first degree in philosophy, so he's bound to have some interesting insights in the question at hand.

His answer is both simple and compelling (I'm rephrasing it here):

Software Engineering is an engineering discipline because it uses Computer Science, and Computer Science is a science because its results are exploited and confirmed by software products!*

Sounds at first rather convincing, doesn't it? But wait, for a philosophy major, isn't he overlooking something?

2. Circulus Vitiosus?

Well, embarassingly, it's looking suspiciously similiar to a circular argument , right? For SW Engineering gets justified by the CS but the CS is justified by SW Engineering? It certainly appeared like this to me at the first glance So let us have a closer look at it:

science is: "the systematic study of the structure and behaviour of the physical world, especially by watching, measuring and doing experiments, and the development of theories to describe the results of these activities "***

Does this apply here? Well, maybe, if for "physical world" we substitute the human-built things like hardware and binaries. But wait, don't we rather work with mental constructs instead (you know, languages, functions, classes)? Well, yes, but they are models, like math is a model in civil engineering. In the end it's all about how the binary runs!

engineering is: "to design and build something using scientific principles"***

This one seems to fit perfectly. So both parts of the argument are corrct, but it is a circular one? Let it translate it form English into the language of logic:

(CS usedIn SWEng => SWEng is Eng) and (CS usedIn SWEng => CS is Sc)

You see it's not really circular, as it's two separate propositions, and they are just rephrasing the two above definitions (check it!). Because science is defined as a special, rigorous kind of examination of the physical world, and engineering uses science when dealing with this world directly! But wait again, it's not said that we are allowed to use only scince in engineering, or ist it?

3. The Engineer an the Scientist

This leads us to another interesting aspect of that presentation, which was the comparison between science and engineering, and how they relate to each other. Somehow explaining the above realtionship (I must paraphrase again):

so it's first:
scientists are interested in pure truths, i.e. programs which doesn't have errors, while an engineer must compromise, i.e. live with programs that have got some errors,

and second:
scientists are interested in certainity, while an engineer lives in incertanity, must take risks and perform risk management,

and thus:
scientists should be women, engineers should be men (Yes, he really made this little joke:)

There are several other comparisions along these lines in the presentation, and furthermore, even a gradation between the two extremes like applied scientist or scientific engineer are introduced. So there is (almost) a continuum of positions between the two extremes of pure CS scientist an a common hacker.

And that fact harbors some kind of consolation, particularly if I'm in a very pedestrian project and must do the most boring things. Just remember, you can always move up the ladder, and use more science, more absolute truths, and thus (as the ancients believed) come in contact with more beauty. An that's maybe the ultimate "consolatio philosophiae“, or do you think this is an exaggeration on my part?

--
* all citations are not original but rephrased by yours truly, so if they are wrong it's my fault! Original source: Tony Hoare, "The Science of Computing and the Engineering of Software", QCon Conference, Jun 11, 2009 , London: http://www.infoq.com/presentations/tony-hoare-computing-engineering

** for example this presentation by Linus was fun: http://www.youtube.com/watch?v=4XpnKHJAok8&feature=player_embedded

*** the definitions are taken from "Cambridge Advanced Learner's Dictionary", as I was looking for simplicity: http://dictionary.cambridge.org/

Tuesday, 28 July 2009

Two C++ curiosities with a deeper meaning

Lately, I was quite surprised by two things in the realm of programming, both of them C++ related. The first one is the new attribute syntax for C++09*, which I curiously failed to notice in the new standard proposal as I first had a look at it.

1. The first one

What are attributes? No, they aren't class data members with a convenient access syntax in this case (like Groovy's attribute syntax - or was it Python?).

I personally only knew attributes as an ugly Microsoft Windows COM-specific hack, by virtue of which you can inject COM related information into C++ code. Look at this example:

  #define _ATL_ATTRIBUTES 1
  #include <atlbase.h>
  #include <atlcom.h>
  #include <string.h>
  #include <comdef.h>

  [module(name="test")];

  [ object, uuid("00000000-0000-0000-0000-000000000001"), library_block ]
  __interface IFace
  {
     [ id(0) ] int int_data;
     [ id(5) ] BSTR bstr_data;
  };

  [ coclass, uuid("00000000-0000-0000-0000-000000000002") ]
  class MyClass : public IFace
  {
  private:
    int m_i;
    BSTR m_bstr;

  public:
    MyClass();
    ~MyClass()

    int get_int_data();
    void put_int_data(int _i);
    BSTR get_bstr_data();
    void put_bstr_data(BSTR bstr);
  };

Doesn't it send shivers down your spine? You might say I'm not consequent, because I just lashed out on excessive XML configuration files (here), but when presented with the alternative, I don't like it either! And now, there are annotations in Java too, doing a similiar thing... What I can say, I was growing up with the classical paradigm of the standalone programm using external functionality through libraries. But nowadays you don't program just for the machine or the OS, you program for same giant framework! Did you hear the phrase "programming by bulldozer"? And to my mind there's no satisfactory model of how to do this! For me both the configuration file madness and mixing code with meta information are both somehow unsatisfactory. Look at this example JBoss Seam action class:

  @Stateless@Name("manager")
  public class ManagerAction implements Manager {
      @In @Out private Person person;
      @Out private List<Person> fans;
   }

Well, what do you think, does it look OK? We have (nearly) POJOs here (i.e. no framework class derivation) which is the Holy Grail in Java programming lately, but the code is littered with annotations. To me they are no better than the old C++ event macros used in some very old Windows frameworks!

Now the question arises: is that a cognitive bias caused by the preceeding simple (and thus elegant) programming model? Remember, even Herb Sutter, when charged with the task of making .NET programming possible with C++, didn't embrace the traditional program + library model! He rather created a language extension, the C++/CLI langauge dialect. Maybe he was constrained to do so by his commision of getting Managed C++ up to speed, but on the other side, in the design paper he argued it's the natural way and compliant with the spirit of C++. Personally I didn't like this language extension a bit, opting for a carefully designed library or framework oriented solution, but maybe it's simply not possible???

Sorry, I digressed. So back to the subject!

The second species of attributes I knew were the Gnu attributes, looking like that:

  void kill(int) __attribute__((__noreturn__)); //GCC specific

but I never used them either, because I thought they were only of any use with som Gnu-specific langauge extensions, which is always avoided like the plague. And do you know how the new standard compliant attributes look like now? Look ("C++09 Working Draft n2798", 10.2008, Chap. 7.6.):

  class C
  {
    virtual void f [[final]] ();
  };

It's a crossbred of Microsoft an Gnu annotations, isn't it? That's why everyone is happy, and why the syntax is really ugly. And do you know what? There are a fistful of annotations defined in the standard (but vendors are allowed to add new ones) and one of them is, you guess it, final! Could someone tell me why should this one be an annotation and not a keyword - is it supposed to be matter in specific environments? Or are we making the way clear for vendor specific language extension? So maybe the whole annotation thing is in there to placate the major vendors: Microsoft and Gnu? Because as I got it (correct me please if I'm wrong!) there isn't a mechanism to plug in an attribute processor as in Microsoft COM-attributes or Java's annotations, so the whole construct is aimed at the compiler writers.

Summing up: first, Java annotation syntax, as seen in the Seam example above, seems to be more pleasing to the eyes, and second, I didn't really grasp the need for standardized annotations, except for vendor's conveniency.

2. The second one

The second curious thing I stumbled upon is the removal of Concepts (or should say of the "Concepts" concept?) from the C++09 standard proposal. As I read**, it was the standard library where the Concepts were first removed from. Come again? As I had a look at the draft, all the std library chapters were strewn with concepts, every one of them in pretty blue (or was is green?) colour.

And wasn't the Concepts supposed to be the best thing since sliced bread? I mean not for me, as I had a look at it one, and didn't want to learn it (in patricular the whole concept_map stuff!!!), but for the compiler writers - they could at last avoid the horrendous error messages for type errors in the template instatiations. Beside of that, it was Bjarne's pet project, and I thought that all Bjarne's PhD and post-doc students were working at it!

And now, out of the blue, the whole feature was cancelled! So maybe there's a limit to the pushing of the type system? C++ is a relentless type-monster already, and sometimes programming it feel like writing a mathematical proof to me. Thus building another level of type definitions on top of the generic meta types is maybe too much of the good thing. But, OK, the idea is not to take out the "duck typing" altogether from the templates, but to strike a delicate balance: add an minimum of concept decalarations and get the best error messages and type security possible. Is that task simply to complicated and too big or is it only the current design of the mechanism which is flawed?

As Bjarne said in the paper which finally did the deathly blow to the Concepts***:

Concepts were meant to make generic programming easier as well as safer. It is part of a whole collection of features aimed at simplifying GP, together with template aliases, auto, decltype, lambdas, etc. However, "concepts" is a complex mechanism and its language-technical complexity seems to be leaking into user code. By "language-technical complexity" I mean complexity arising from the need of compiler/linker technology rather than complexity from the solution to a problem itself (the algorithm).

My particular concern is that in the case of concept maps, in the name of safety we have made templates harder to use. We require programmers to name many entities that would better be left unnamed, cause excess rigidity in code and encourage a mindset in programmers that will lead to either a decrease in generic programming (in favor of less appropriate techniques) or to concepts not being used (where they would be useful). We have overreacted to the problems of structural typing. ***

This means that exaclty what I wasn't willing to learn leaked out into the user space! Bjarne made a series of proposals how to avoid it, but my general feeling after having read the paper was that the concept's design is somehow flawed, and cannot be fixed by some quick workarounds. So it's no wonder that the commitee had to make a tough decision. And that after years of work! And nobody have seen it coming all these years? Don't you think it's unacanny - how little we can do as to design a complicated language feature and to avoid mistakes? Or maybe we see here how badly the standardisation process is working. These problems were there for a much longer time and nobody said a word! Because of lack of time, because of politics, or simply because noone really understood that topic? So it had to be the creato of C++ himself to blow the whistle:

The use of concepts is supposed to help people write and use a wide range of templates. The current definition of concept maps and the philosophy that seems to go with them makes it harder.

Addressing this is important. I suspect that the alternative is widespread disuse of concepts and libraries using concepts. I would consider that a major failure of C++0x. ***

I'd say he killed his pet project out of sense of responsibility.

And where is the promised deeper meaning? As I said previously: there's maybe a limit to the type system. Because for me it felt sometimes like constructing inaccessible cardinals on top if the regular ones (wiki) in the set theory - just very, very subtle. And maybe the horrendous error messages for templates are just the proverbial price of the (type) freedom, and we have better to pay it? When you read Bjarne's article*** you see, that the gratest problems are encountered in sublassing and conversions - so C++ is much more "modern" a language than most of us would like to believe, as it allows us so much of the type freedom using the type system itself to achieve it!

And maybe the design problem relatively simple to solve, but the archaic linker technology we inherited from C is simply too old for that? As it reads:

"... complexity arising from the need of compiler/linker technology rather than complexity from the solution to a problem itself (the algorithm)"***

---
* Danny Kalev "Introducing Attributes", http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=440
** Danny Kalev "The Removal of Concepts From C++0x", http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=441
*** Bjarne Stroustrup "Simplifying the use of concepts", 2009-06-21, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2906.pdf

Saturday, 27 June 2009

For German Speakers only...

Hi everyone! As it comes, I stumbled recently upon an interesting C++ website, which I quite enjoyed, and even happened to learn something new in the process! Its address is:

http://www.kharchi.eu/wiki/doku.php?id=cpp:start

I particularly liked the coverage of some more modern topics liker my favourite theme of Web programming in C++, embedded C++ scripting, or the string tutorial covering conversions between different encodings (and not to forget the link to the Pirates Party :)). Don't get me wrong, it's not the uber C++ webpage, but it's a cute, interesting little thing!

Unfortunatlely it's all in German :-((. Good for me (see, learning languages pays off sometimes), bad luck for the rest of you. So if you're speaking German, have a look at it!

Speaking about languages, there's a little story as a goodie: on some project several years ago I had to update an open source tool for copying entire websites. It was thousands and thousands lines of plain, spaghetti-style C code. As this wouldn't be bad enough, all the comments that were there weren't in English, no, they were all in French!!! Hadn't I learnt some French for my holidays, I couldn't possibly have this assignment finished! Without comments the code was absolutely incomprehensible! However it worked fine, even if it wasn't any OO or structured programming thing. Why? For all we know it shouldn't ;-))).

Thursday, 28 May 2009

Windows not so bad after all?

This post is for my distinguished hacker colleague Stefan Z. Do you remember how we were always complaining that you cannot do any serious development on Windows, as the following standard Unix command line isn't posible?

    cat out.txt | grep ERROR

Well, as I'm doing Windows programming now, I recently found out that it's possible after all! Incredible but true! Hovever the syntax is somehow different:

    type out.txt | findstr ERROR

On things like that I see how fast the time is passing! A couple of years before, I'd never believe that Windows would ever catch on. I think, the tide change came with XP.

But wait, it comes even better! Here's a script for checking if a process is running and killing it if so:

   tasklist /FI "IMAGENAME eq myProcess.exe" | findstr myProcess.exe
   if ERRORLEVEL 1 GOTO next

   echo "-- kill running myProcess"
   taskkill /F /IM myProcess.exe

 next:
   echo "-- starting the XXX process"

Even a sleep is possible (athough it's arguably very ugly):

    @ping 127.0.0.1 -n 2 -w 1000 > nul

And here (in the IT-Dojo) you can see that the Windows command promtp has some interesting history mechanisms!

Tuesday, 19 May 2009

Two Java Collection Utilities

Lately I stumbled upon two iteresting Java libraries which add search capabilities to Java collections: one adding SQL search and another adding XPath search syntax to an existing collection. I think it's cool - you can take a collection of Java objects and treat it in a high level way! I suppose they are implemented using introspection.

The first one is called JoSQL (http://josql.sourceforge.net/index.html) and can be used like this:

try {
    Query q = new Query ();
    q.parse(
        "SELECT *" +
        "FROM java.lang.String " +
        "WHERE length <= :stringSize " +
        "AND toString LIKE '%e%' " +
        "ORDER BY length DESC");

    q.setVariable ("stringSize", "5");
    List<String> names = new ArrayList<String>();
    String[] n = {
        "alpha", "beta", "gamma", "delta", "epsilon", "zeta", "pi", "chi"
    };
    Collections.addAll(names, n);

    QueryResults qres = q.execute(names);
    List res = qres.getResults();
}
catch(Exception e) ...

Here I got { "beta", "delta", "epsilon" } as result. The only problem I can see as far is that the library doesn't use generics. The other one is the somehow unintuitive way it's treating the "SELECT *" query. Nomally the JoSQL library treats a collection of Java objects as a table which columns are formed by the values returned by the class' methods. However, in case of a "SELECT *" query, we won't get all of the object's attributes, but just the whole object as it is. On the other side, it's the equivalent, isn't it.

The second one is called jXPath (http://commons.apache.org/jxpath/), and is part of the Apache project. You can use it like that:

    JXPathContext context = JXPathContext.newContext(employees);

    Employee emp =
       (Employee)context.getValue("/employees[name='Susan' and age=27]");

Pretty nifty, isn't it? But while SQL query syntax seemed obvious to me, the XPath search expressions seems plain ugly and non-intuitive. I don't know both SQL and X-Path very good, but I find myself always forgetting pretty everything about XPath while SQL gets stuck. Maybe because XPath is too low-level an too much a hack?

You can read in many blogs that the above libraries are pretty the same, but in reality, the XPath query syntax is better for hierarchical data, somehow like that:

    Employee emp =
        (Employee)context.getValue("/departments/employees[name='Johnny']");

I assume you can do the equivalent in JoSQL, but you'd like to use joins, an that isn't that convenient for simple hierarchical structures like that above. So, in the end, maybe the XPath isn't so bad after all?

Wednesday, 29 April 2009

From DLL- to XML-hell

This is one of the original articles I wanted to write when I started this blog. Well, it's taken some time, it's a little outdated now, but at last I've kept my promise (or rather one of them) !

1. The Lament

As I switched from C++ to Java in some previous project I thought I plunged directly from the DLL- to the XML-hell! Well, not exacltly (as I didn't programm under Windows in the preceeding project) but a little hyperbole makes a good start! The XML part of the title however, is true. The very first thing I noticed, was the ubiquitous XML - you had to use it for compilation (Ant scripts), you had to use it for for deployment (web.xml), you had to use it for your application (struts2.xml, spring.xml, log4j.xml).

The most annoying part of this was the change from makefiles to Ant scripts: so much superfluous verbiage! Look, compare the Ant script with the equivalent Ant-builder script in Groovy, taken from the "Groovy in Action" book, first the XML file:

<project name="prepareBookDirs" default="copy">
  <property name="target.dir" value="target"/>
  <property name="chapters.dir" value="chapters"/>
  <target name="copy">
    <delete dir="${target.dir}" />
    <copy todir="${target.dir}">
      <fileset dir="${chapters.dir}"
        includes="*.doc"
        excludes="~*" />
    </copy>
  </target>
</project>

Then the builder script:

  TARGET_DIR = 'target'
  CHAPTERS_DIR = 'chapters'

  ant = new AntBuilder()
  ant.delete(dir:TARGET_DIR)
  ant.copy(todir:TARGET_DIR){
      fileset(dir:CHAPTERS_DIR, includes:'*.doc', excludes:'~*')
  }

Well, that's at least readable (and writeable), and it states exactly what you want! When I see such a comparison I think that the XML code should be used only as Ant-internal, assembly-code level data representation! You could maybe find it interesting that even the creator of Ant himself admitted that using XML was "probably an error"*.

The second problem was that, (as another programmer put it):

Developers from other languages often find Java's over-reliance on XML configuration annoying. We use so much configuration outside of the language because configuration in Java is painful and tedious. We do configuration in XML rather than properties because...well, because overuse of XML in Java is a fad.**

I can only confirm this. You can get a feel of how much of XML "code" was needed in a Java web application I was then writing from the following (addmittedly pretty old) table*** :

     Metric               Java +              Ruby +
                     Spring/Hibernate         Rails 

  Time to market     4 months, approx.      4 nights,
                      20 hours/week       5 hours/night
  Lines of code          3,293                1,164
  Lines of config        1,161                113
  Number of classes/     62/549               55/126
  methods

Do you see? The configuration, it's 30% of the code (!!!), and that's 10 times more than was needed in Rails! That's some dependency! By the way, note that Rails needs only 3 times less code that Java, despite the fact that Java libraries can be sometimes pretty low level!

Now, for me the Spring 2 platform was somehow an apogeum of the XML overdependency - the endless configuration files weren't even human readable, you should better use a graphical Eclipse plugin:

to edit the wirings. That's like you need something like UML to tame the complexity of the XML files (and of Spring's configuration of course):

A rhetoric question: isn't it too much of a good thing for something as simple as configuration files? I found the Spring-2 MVC (Spring's web GUI application framework) particularly bad in that respect. Comparing it to the Struts 2 configuration, I had an impression that every obvious fact must be expressed in XML, and that there's no defaults altogether:

<bean name="userListController"
    class="de.ib-krajewski.myApp.UserListController">
    <property name="userListManager">
        <ref bean="userListManager" />
    </property>
    ....
</bean>

<bean id="userListManager"
    class="de.ib-krajewski.myApp.UserListManager" />

and so on, and on, and on.

But wait, it comes even better: some people are so enamoured with XML that they are using it as a programming language****, either doing patterns with Spring and XML configuration files, or even inventing an XML-based programming language and writing its interpterer in XSLT (I'm not joking!).

A word of caution: Java community noticed the problem and is trying to reduce the amount of XML needed. Unfortunately, I haven't got any hard numbers, but each new release of a Java framework seems to claim a "reduced XML config file size" - take for example Spring MVC v.3. However, as the DLL hell is still a reality despite of Microsoft's efforts and the introduction of manifests (really, look here!), equally, for my mind the overdependence on XML won't vanish overnight from the Java world - it's at its core!

2. The Analysis

2.1 XML data

Why are we all using XML? The received standard answer is: because it's a portable, human-readable, standardized data format with a very good tool support!

So why does it all feel so wrong? The first reason I see is that XML is too low level for human consumption. It's claiming to be human-readable while it's only human-tolerable - see the Groovy AntBuilder example above. There are tons and tons of verbiage (although that's something a Java programmer will be rather accustomed to ;-)) which is what machines need, but it's not how human mind is working. It needs high level descriptions because it's only all to easy distracted by unneeded details.

When I recall my own config file implementations of yore, I conclude that I never needed more constructs than single config values ans lists of them. And I never needed hierarchical config data for my systems. So if people are using XML for configuration, they are maybe after something more than pure config settings, maybe they are looking for a scripting solution for the application (as mentioned in*) along the lines of the late Tcl? But with Tcl the model was entirely different: the Tcl scipts glueing together passive modules of code, whereas now, the modules are active and use XML as a database of sorts... So is it that at last? People using XML data as a poor man's, read only database? It's a good, ad-hoc solution, and you can emulate hierarchical, network and OO databases without much of a hassle, can't you? So people are using XML as a kind of qiuck and dirty hack, and hacks aren't normally very pretty ;-).

2.2 XML in Java

The second part of the question is: why are we using so much XML in Java? The answer is probably that we need XML config files to increase modularity an require the loose coupling of our systems. The Spring framework allows us to pull the parts of the system together at the startup or run-time, and that's supposedly a good thing, isn't it.

Coming to this part of the question: I think that the loose coupling thing is overrated. As Nicolai Yossutis said in his book about SOA (we reported ;-)):

"...loose coupling has its price, and it's the complexity."

So firstly, a little bit of coupling isn't that bad if it decreases complexity and increases readability of code! To my mind, it's a classical tradeoff, but as some developers are only too prone to introduce tight coupling all over the place, the received wisdom is to avoid the coupling at every price! But be honest, you can't decouple everything, if the parts are ment to be working together! Secondly, I thing that the "inversion of control" pattern is a bit of overkill too.

Personally, I'd like to glue together the parts of the system directly in code, so that I can see how my software is composed, and that in my source code - it should be the sole point of reference! Thus I like much more the approach Guice or Seam are taking (and nowadays the Spring framework itself) - that of annotations. At last we have the relevant information where it belongs - in code! The problem here is, that the annotations aren't really code, they are metadata, and I don't like the idea of using metadata where you could do things directly by using some API. But hey, that's probably the way we are doing such things in Java...

3. A Summary

That's what I wanted to write approximately 2 years ago. Meanwhile I use XML everywhere in my architectures and designs, mainly because it excludes any discussions about formats! No one has ever said anything against XML!

But programming it (in C++ and Qt) is a real pain - I first tried DOM, than SAX - and it was an even greater mess (well, there's a C++ data binding implementation á la Java's Apache XMLBeans but it's not used at my client's :-((). XPath's selectors are supposed to be better, but Qt doesn't have it by now. The good thing is, everyone will think the system is sooo cool because it uses XML, and I have to make my users happy!

PS: And you know what? The new dm application server by SpringSource uses JSON for configuration files!!! I told you so ;-)

--
* "The creator of Ant excercising hit own daemons", this article was originally stored under http://x180.net/Articles/Java/AntAndXML.html, but this link seems to be dead, so I'll host it on my website for a while (hope it's OK...): http://www.ib-krajewski.de/misc/ant-retrospect.html

** as a matter of fact, he's a Ruby programmer too, so he isn't that unpartial: http://www.brainbell.com/tutorials/java/About_Ruby.htm

***taken from the book "Beyond Java" by Bruce Tate: http://commons.oreilly.com/wiki/index.php/Beyond_Java/Ruby_on_Rails, unfortunately the original link stated there isn't working: Justin Gehtland, Weblogs for Relevance, LLC (April 2005); http://www.relevancellc.com/blogs. "I *heart* rails; Some Numbers at Last."

**** "Wiring The Observer Pattern with Spring": http://www.theserverside.com/tt/articles/content/SpringLoadedObserverPattern/article.html, and "XSL Transformations. A delivery medium for executable content over the Internet" in DDJ from April 05, 2007: http://www.ddj.com/architect/198800555 . I cite from the latter:

XIM is an XML-based programming language with imperative control features, such as assignment and loops, and an interpreter written in XSLT. XIM programs can thus be packaged with an XIM processor (as an XSLT stylesheet) and sent to the client for execution.

See, I wasn't joking...

Saturday, 4 April 2009

Some distributed computing news: cuddly elephans, pigs, nymphs and couch computing

As I did some distributed computing research in my previous life, and then tried to get involved with Google, I'm still somehow interested in the subject. If you are into distributed computing, you'd know that Google is using a distributed C++ system called MapReduce* for indexing of the crawled webpages. It happens that there's an open-source Java implemenation of Google's system (based on published Google's research papers) called Hadoop . Up to now I didn't know about any serious usage of Hadoop in big distributed systems, but as it comes, I learned that Yahoo! is using Hadoop internally** for spam detection!

Yahoo's researchers have also designed a new language called PIG Latin (?!)*** for distributed programming on the Hadoop platform. The idea is rather an interesting one: they pointed out that the existing way of writing programs for the mapper and reducer part of a distributed application is too low-level and not reusable, but somehow familiar to programmers. In contrast, a declarative language like SQL poses some problems, so the paradigm there is completely diffrent:

At the same time, the transformations carried out in each step are fairly high-level, e.g.,filltering, grouping, and aggregation, much like in SQL. The use of such high-level primitives renders low-level manipulations (as required in map-reduce) unnecessary.
...
To experienced system programmers, this method is much more appealing than encoding their task as an SQL query, and then coercing the system to choose the desired plan through optimizer hints.

So it provides a middle ground, where we can avoid SQL and write down the computation steps, but can do that at higher level!

Unsuprisingly, Microsoft already has a proprietary distributed computing platform called Dryad, which tries to do something like this as well, but is using C# and LINQ (i.e. an SQL-like query language):

DryadLINQ generates Dryad computations from the LINQ Language-Integrated Query extensions to C#.

But I don't know how much this is a research project and if it's really used internally. On the other side, Microsoft recently aquired a search company called Powerset, which is using Hadoop at its core, and as the gossip runs, should be used to bring Microsoft's LiveSearch up to speed.

Facebook, in their turn, are using Hadoop, but added a Business Intelligence tool called Hive (now released as open source!) on top of it. It uses an SQL-like query language, as BI-users understand it best. Beneath the surface Hive leverages Hadoop and translates SQL-like imperatives into MapReduce jobs.

Not bad, cuddly elephant! It seems everyone is loving you!

Accidentally, when speaking about SQL and non-SQL: there's a non-SQL database by Apache called Couch DB. It is written in Erlang (we reported already**** ;-)) and implements a distributed document database. It's designed for lock-free concurrency using Erlang's "share-nothing" philosophy. To my mind it resembles Linus' Thorvalds git source configuration system, as it holds all the versions of documents determining the "winner" to be visible in defined views. Hear, hear! Erlang's alive, distributing and kicking!

Another thing worth noticing is that's not only SQL anymore, there are attempts to do things differently, athough others are sticking with the ol 'n reliable. Paradigm change on its way? It's interesting times I think.

PS (6 Okt. 2009): Look, look, now even the Big Blue has jumped the wagon too! They offer the M2 enterprise data analysis platform, based on Hadoop and using PIG. See: http://www.sdtimes.com/link/33808. So it seems Hadoop is definitely mainstream and business analyst thingie now, and as such it has lost much of its appeal, as far as I'm concerned.

---
* Jeffrey Dean, Sanjay Ghemawat: "MapReduce: Simplified Data Processing on Large Clusters", OSDI 2004, http://labs.google.com/papers/mapreduce.html
** unfortunately it's a two-parts video: http://developer.yahoo.net/blogs/hadoop/2009/03/using_hadoop_to_fight_spam_-_part_1.html
*** Christopher Olston et al: "Pig Latin: A Not-So-Foreign Language for Data Processing", SIGMOD 2008, http://www.cs.cmu.edu/~olston/publications/sigmod08.pdf
**** http://ib-krajewski.blogspot.com/2007/08/erlangs-change-of-fortunes.html

Monday, 30 March 2009

One cute microdesign example and some thoughts on language design.

This time I'll begin with a citation of another blogger*:

I made the switch from C++, I'd wished that Java allowed operator overloading (i.e. custom implementations of operators such as '+', or the array brackets). However, some argue that this feature adds to the complexity of C++ with little advantage, and that there is no need to pass this complexity on to the Java language. Someone who never coded in C++ would more than likely agree with that.

OK, someone who never coded C++ won't understand that, and will take the argument about the undue complexity as received wisdom. So let's show you a little piece of design, a microdesign actually, so you can (maybe) appreciate the feature. If you're a Python/Perl/Groovy (or C++ for that matter!) programmer, I must apologise, but maybe you'll find this small example nonetheless instructive.

1. A piece of microdesign

It's a piece of code from my old project I wrote several years ago. I wanted a class for interthread and interprocess messages (you know, for my favourite pattern: share-nothing, message-passing actors). I wanted to be able to construct a message using parameters of the constructor like this:

    AbstractData notif;
    SocketMsg msg(DispInterface::crReqNotifType, &notif, father);

And then simply send the formatted message as binary data:

    m_connSocket.send(msg, msg.buffsize())

On the other side, I wanted to receive the binary data from socket directly into the message object, which would then parse it and provide high level access to the received data, just like:

    SocketMsg msg; // empty
    if(!m_connSock.receive(msg.writebuff(), msg.buffsize()))
    {
        // if too much errors, better bail out here!
        TRACE_AND_CONT("Cannot read from socket in an FD event handler!",
                       E_ID_Q3A_SOCK_ERR); // try to escalate
        return;
    }

    // look what message received
    if(msg.getType() == cmdOkMsgType)
        handleCmdResp(msg.getAbstrData());

Pretty high level, eh? Just hide away all that low level data parsing and formatting and use the message!

It's design on its lowest level, you cannot possibly make a smaller piece of meaningful design, but I liked that piece! Now a question for the astute reader: what pattern is that? Proxy, bridge or a facade? Or a builder maybe? Perhaps none? Well, I don't think in pattrens, I think in abstractions: what I needed was an abstraction of a self-formatting message. Don't you like it too? Unfortunately, you need some operator overloading magic to do this. Look at the following class:

class SocketMsg
{
public:
    SocketMsg(int type, AbstractData* message, DsetId dsetId = nillDsetId,
              int cmdId = 0, int userCode = 0);
    ... etc

    // conversions
    operator const char*() const { return m_charRepr; };
    inline SocketMsgBufferProxy writebuff(); // see inline section

private:
    // buffer
    static const int c_buffsize = sizeof(SocketMsgStruct) + 1;
    char m_buff[c_buffsize];

    // decoded data: don't remove in case we use different encoding!
    AbstractData* m_data;
    int m_type;
    DsetId m_dsetId;
    int m_cmdId;
    int m_userCode;

    friend class SocketMsgBufferProxy;
};

You can see in the above definition that the class has a buffer for the coded binary message and several fields for the decoded message values. In the char array context, it exports its internal message buffer for reading (through the operator const char*()) so the socket's send function can read the raw data from there. In the other direction, i.e. for receiving data into the internal buffer a different mechanism is used: we are returning a SocketMsgBufferProxy object (this time through the writebuff() function) , which is defined as follows:

class SocketMsgBufferProxy
{
public:
    SocketMsgBufferProxy(SocketMsg* sm) : m_sm(sm) {};
    operator char*();
    ~SocketMsgBufferProxy();
private:
    SocketMsg* m_sm;
}

What it that proxy object doing? It's forwarding the address of the internal raw data buffer and revealing it through the operator const char*() again. As a matter of fact I could offer an even cleaner interface:

    m_connSock.receive(msg, msg.buffsize())

So why didn't I simply expose the internal buffer with the char* conversion operator and introduced a SocketMsgBufferProxy object inbetween? Simply because of the SocketMsgBufferProxy's destructor. When its destructor is invoked, it rescans the raw contents of the receive buffer so after the receive() line of code is finished, the message object is parsed and ready! Of course there is another possibility to achieve this: lazy parsing, but I was somehow more excited with that cool proxy object automatically coming inbetween and doing all the work! I'm only a human too and sometimes looking for the unusual...

2. And now some more general discussion

So as you can see, the operator overloading is a pretty useful little feature. So why didn't the designers of Java include it into the language definition? My opinion is that they overreacted. I can still remember the times of the “object craze” when everybody tried to be object oriented without knowing what that really was, and to use every single feature C++ offered without thinking if it's needed or not for the particular problem at hand. That was somehow similar to the “patterns craze” I can see even today: people asking me “so what pattern could I use here?” - you don't need a pattern here, you need a solution of the programming problem, or do you want just a cool name to stick it on your code? But more about that in future entry.

You wouldn't believe how stupid the people got with the operator overloading, maybe it was something like the ”powder rush” of the skiers, an incredible infatuation with the until-then unheard-of possibilities! You must consider that a fresh-made C++ programmer came from a much smaller language (namely C) and normally didn't have exposure to any higher level language concepts form Smalltalk, Lisp or Simula. So the Java designers simply thought “Hey folks, it's time to cool down!” and banned the feature. To my mind, a fatal case of underestimation of people's capabilities and of patronizing. Because the result of it is that “you cannot do any magic in Java!” (well, actually you can with introspection and bytecode insertion, but only so much, and that's simply not enough). Look at the following citation**:

In fact, I'd say that many of today's current hot trends in programming are a direct result of a backlash *against* everything that Java has come to represent: Lengthy code and slow development being the first and foremost on the list. In Java you generally need hundreds of lines of code to do what modern scripting languages do in dozens (or less).
...
So what's the solution? Well, this is the problem. People have been tinkering with Java for years now, and there's still no hope in sight. There's something about the Java culture which just seems to encourage obtuse solutions over simplicity. As a Java developer, I was always so amazed at how difficult it was to use the standard Java Class Libraries for day-to-day tasks.

And you know what? My pet idea is, that the reason why Java code is so dull and repetitive (and that the Java culture is, ehm..., you've just read it...) is the lack of operator overloading! You think I'm overly generalizing now? Well, the fact is, you simply cannot hide the gory details as conveniently as I was able to do it in the SocketMsg class! The best indication that the people in Javaland are thinking the same, is the Groovy JVM programming language: it has the operator overloading at its core, and is capable of doing things you wouldn't dream possible in Java. Well, that's no virtue in and by itself, but as result it gives you much leaner code! And we all need it: less code!

3. Now something even more general

When am already at it, Java programmers I met wouldn't understand the idea of the h-files (aka include files) too. Why are they needed for Heaven's sake? Isn't it an awful bore? I admit that the origin of the h-files lies in the limitations of the early C compilers, but with C++ it gained another significance: a “table of content” of the code proper. I give the word to another man from Java(FX)land***:

From my experience with C++ and Java, having method bodies in the class declaration clutters it with a mass of implementation details which is detrimental to getting an overview of the actual relationships and operations embodied by the class. It was for this reason that I decided to define the bodies of class operations and functions outside the class declaration.

Exactly what I think! You need an overview for the class, so you don't have to skip every one int and bool declaration until your head gets dizzy. You'll retort that this will be done with Javadoc comments. Well... The Javadocs... It's quite a theme of its own. Let me give only a small example here: if I've already written the function declaration like:

    void sendMessage(SocketMsg& msg, Receiver dest);

why should I repeat all this in a trivial Javadoc header? Is it "don't repeat yourself" or what? Don't get me wrong, I wouldn't check in any uncommented code, but I don't use them in (C++) header files - they only clutter up and make the definitions unreadable! For me they are well suited as implementation descriptions. And seconly, you have to run the javadoc tool first, you cannot just have a quick glance at the code! I must admit, sometimes I'm annoyed about having to edit the h-files, but when I'm reading the code afterwards, I'm happy I did that. Pretty non-progressive, isn't it?

4. Summing up

Maybe you'll say that it's not ethical to lambast a slightly ageing, slightly demodée, mainstream language, and that all is only too well known for several years now, but hey, it's just what I always thought! And like with my other favourite subjects (Agile, XML, Python, patterns), it simply takes time till I write it down, and it's no so insightful then, sorry...

--
* http://www.ddj.com/blog/javablog/archives/2007/08/java_and_potato.html
** http://www.russellbeattie.com/blog/java-needs-an-overhaul
*** Chris Oliver, Creator of JavaFX, in his presentation about JavaFX: http://jazoon.com/jazoon07/en/conference/presentationdetails.html

Saturday, 21 February 2009

Fibers, at last!

As I'm recently doing Windows programming again after some 10 years of a pause, I keep learning about new things. One of them are fibers. Come again? What's that??

A fiber, as the name implies, is a piece of a thread. More precisely, a fiber is a unit of execution within a thread that can be scheduled by the application rather than by the kernel. *

So fibers are just user space threads, aka cooperative multitasking, which I knew since the stone-age DOS times and which weren't supposed to be cool at all!

It were native kernel threads which were supposed to be cool back then! But I must admit that in the past as I was working with cooperative multitasking in a rather big, multithreading system, I was quite happy with it! It gave a nice, predicatble feel to the system; you just new at what times what threads are supposed to run! The sole problem was (of course) that a long running user task couls block all the system. But then we moved on to native threads and I forgot about cooperative multitasking.

But something has changed since then: first, as it was already said, Windows has got it, although allegedly only as a good-enough implementation to support the SQL-Server (fibers have severe restrictions on the kinds of Win32 API’s they can call!). Second, Oracle 10g has got it too, I cite**:

Fiber model support
- An extension of the thread support already in place since Oracle7. Users may now run the database in fiber mode, which employs Oracle-scheduled fibers instead of O/S scheduled threads.
- For CPU intensive apps, this will provide a performance boost and reduce CPU utilization.

Moreover, recently I discovered a fiber library for Windows XP to remedy the unsufficient native Windows fibers support named FiberPool***. And then there are Green Threads (user space level, cooperative multitasking threads) in Java's JVM for some time now! The recent Windows 7 beta has introduced User Mode Scheduled Threads (UMS Threads)**** , which are like Win32 Fibers, as they can be scheduled by hand, but are backed by real bona-fide kernel level threads! Sounds rather interesting!

So at last it's the old, good cooperative mutitasking rediscovered? There's a growing evidence! Will be the ol' good times of DSET maybe back then Stefan?

---
* Johnson M. Hart: Windows System Programming Third Edition, Addison Wesley 2004
** http://download.oracle.com/owsf_2003/40171_colello.ppt
***http://www.fiberpool.de/en/research.html
**** http://blogs.msdn.com/nativeconcurrency/archive/2009/02/04/concurrency-runtime-and-windows-7.aspx

Wednesday, 18 February 2009

C++ Server Pages

In my prevoius blogs I wondered if there is/could be a C++ implementation of the Java's Servlet interface(s)*, and recently a friendly fellow blogger** pointed out to me that there already is one, viz the CPPSERV: http://www.total-knowledge.com/progs/cppserv/!

Can you see C++ it its logo?

I cite*:

CPPSERV is a web application server that provides Servlet-like API and JSP-like functionality to C++ programmers.
...
CSP (C++ Server pages), the JSP equivalent for CPPSERV, are currently implemented as a precompiler that converts CSP source to C++ source of a servlet, which, in turn needs to be compiled into CPPSERV servlet.
...
Current CSP implementation supports basic parsing, as well as compile-time taglibs.

Well, that's it! Or perhaps not? What we have is the equivalent of the Java's JSPs, which generally suck, and standing alone are too week a framework for serious work. But hey, it's not that bad, as we have got some choices: the CPPSERV for the oldskool JSP-like programming, the late Bobcat for more of the same, or the Wt framework*** for more modern, GWT-like work!

And with some discipline (I mean the model 1 ;-))) servlets, do you remember???) it's possible to get working application in C++ quickly! And all that Java frameworks out there are only trying to hide the JSP processing model, and they aren't that beautiful themselves. So for smaller things a JSP-like solution will be probably sufficient, and for more modern applications (including AJAX support) I'd choose the Wt framework.

There remains the question of the OO-Relational mapping, an the creators of CPPSERV suggest using their SPTK classes****, but it seems to be only a JDBC-like DB access layer. I suppose, I'd rather take Qt's DB classes for that, but on the other side, they are somehow heavyweight. But the OO-Relational thing for C++ is a different question in itself. But JDBC style access isn't that horrible either, and in a whole I think there are some viable options for C++ web applications out there!

Now I guess, I should try out the 2 frameworks, but ... ehm ... you know, so little time, so many books...

PS (07 Oct. 2009): meanwhile I have detected two another C++ and C "Servlets". Well they aren't really Servlets, but are rather one step back on the evolutionary path, and are somehow JSP-equivalent. The first of them is named KLone and allows you to include C scripts in the HTML page in the exact JSP manner. The other is GWAN - it's free and it allows you to write "servlet" scripts in C, which will be executed by the server and generate the HTML output - so in this case we are almost there, except for the very limited set of functions you can use in the script. Nonetheless, very interesting!

--
* see: http://ib-krajewski.blogspot.com/2008/10/c-servlets-again.html
** Gavino, unfortunately his profile isn't available :-(((, but nonetheless, thanks bloke!
*** The Wt-framework ("witty" - http://www.webtoolkit.eu/wt/)
**** SPTK: mainly a GUI framework, but it has some DB access support as well: http://www.sptk.net/index.php

Tuesday, 27 January 2009

Reading the new C++ Standard

Recently I've read on my local German C++ forum* that's there was a new draft document for C++09 Standard** and immediately I rushed out to read it. I mean, something tangible at last! Well, actually it wasn't really a read, but rather a very superficial skimming-over. There are good reviews of C++09 on the Web***, so I didn't actually want to read it in depth, but rather to get an overall impression of the changes: just how it all would feel like in C++0x: x < a.

And guess what? It was last year in November (sic!!!) when I read it and then wrote some things up :-(((. Somehow I'm too busy with my current project! Of course I'd like to read the proposal in depth and analyze it here thoroughly, but alas, it's not possible. So let's start with just few of my very personal impressions:

1. Ranges and initializers

What I perhaps liked best of the new features! Now I can write things like:

    map<type-expr> m = {{}, {}, {}};
    int a[] = {1};
    vector<type-expr> v = { a, b, c };
    SomeClass obj({ a, b, c, d });

and that for any class I want! How is that done? You have to implement the initializer_list constructor for your class! Wow, it feels just (like) Groovy to me :-). A great improvement over the oldskool tricks used to force initialization where it wasn't possible by the language definition!

A realated feature I liked are ranges:

    for(int& x: array)
    {
        x = 0; // for example!
    }

This is accomplished internally by a Range<> class being internally instantiated with the array instance: Range<type-expr>(array). It gives you a nice, scripting language like feeling when iterating; I mean, even Java's got it, and the boost::FOREACH macro was in my personal opinion rather an ugly hack than anything else!

When talking about ranges: actually I was expecting that the ranges will be included for the STL algorithms, so we don't have to write the annoing:

    find(v.begin(), v.end(), not_zero());

each time. I was quite positive that we'll going to have something like:

    find(v, not_zero());

as an overload of the traditional STL signature using the new Range<> class (there is a boost::range library after all!), so I must say I'm a bit disappointed here.

2. Some nice utilities I found

There were suprisingly large number of small interesting things to be discovered, like:

error support: error_code, system_error - well, I can definitely use some of it!
compile time (i.e. template) rational arithmetic - suprise! BTW do we really need it?
clock and time_point classes - useful!
stoi: string to int conversion - a suprise!, so don't we take the road of lexical_cast<> as I was expecting all the time?
alignment control - at last I can say what the aligment is and not the machine! Will then the machines revolt?
unique_ptr - just what we need most often, or auto_ptr fixed!
addressof - even if the actual & operator is overloaded!
reference_closure as Callable - what was that again??? But I wrote it up in my original notes, so it must be something funny!

delegating constructors - we can reuse one constructor in another one, like in this example from the draft document:

  struct C {
      C(int){}     // 1: non-delegating constructor
      C(): C(42){} // 2: delegates to 1

nullptr as a keyword - no more the annoing (XXX*)0 casts!

I liked these little things, and there is probably more like that to be discovered. It's refreshing to see that the smaller problems of day-to-day programming are addressed too!

3. Some expected things

As expected we've got the hash tables at last, as actually the whole TR.1 stuff like regular expressions, type traits for templates, tuples, general binders and function objects, is in there too. Plus the move constructors and rvalue references.

Moreover the auto keyword and the lambda functions are there, so now we can write

    auto iter = m.begin();
    // and not
    map<string,string>::const_iterator iter = m.begin();

at last! Just think how many keystrokes/copy and pastes it will save!

Speaking of lambdas - as I read it, there isn't one example of a lambda function in the draft! And moreover, in "Chap. 14.3.1 Template type arguments" there's something missing: namely the old restictions about the non-global linkage of the template parameters! This means, I can write a functor in local scope and pass it to the STL algorithm like that:

    void XXX::someFunc()
    {
        struct my_not_zero {
            bool operator() (const X& p) { return p != 0; }
        };

        // use the local functor 
        find(v.begin(), v.end(), my_not_zero());
        // OR use a lambda function:
        find(v.begin(), v.end(), [] (const X& p) -> bool { return p != 0 });
    }

I'd say, what was the reason for lambdas again (at least for me)? To be able write a functor which doesn't have to have global linkage! Are lambdas obsolete then? Well, both yes and no. Just look at the code example above, writing a local functor still requires quite a lot of plumbing, and a lambda expression is somehow more terse (we could even spare the return type of bool in our example!). IMHO both of them are not wholly satisfactory. Using a lambda library solution I could write it in a much more terse way:

    // OR using my pocket lambda library:
    find(v.begin(), v.end(), _$1 != 0 });

Another expected (and somehow dreaded) new thing are the Concepts. I must say thery are everywhere in the standard library section, like in this new std::list move constructor:

    requires AllocatableElement<Alloc, T, T&&> list(list&& x);

I mean, now the standard library header files will get even less readable, but the compiler errors will be (hopefully) more so. So I'm in two minds about this feature: isn't it too much of the type system? What about the duck typing in templates? Are we giving it up? For me it was a very nice feature. I guess it won't come that bad, and the templates without Concepts will be working as before but are Concepts only for compiler support or are we supposed to use meta-typing everywhere?

The next expected thing, which however deserves its own section is:

4. Multithreading support

Sorry, I didn't read it yet... It's two whole chapters (29 and 30) ant they aren't exactly short.

But instead it started me thinkig: how was I writing multithreading applications before there was the C++ memory model defined? How on earth was this possible? Well, it was all vendor (i.e. compiler) specific. When a compiler supported the POSIX thread library (pthreads) for example, it would pose some memory fences around the pthread calls****. It would treat some pieces of code in a special way - just like in Java's String magic. The local static initializes would be automatically guarded by locks (Gnu compiler) or maybe not (Sun compiler). The volatile keyword would be given an extra visibility guarantee (Microsoft compiler). And I never initialized global static variables out of the multithreading part of the program, only in the main thread portion. An the best of it is, that all these techniques wasn't guaranteed to work****! But hey, the 1st Java memory model wa buggy too, wasn't it?

Now it's all supposed to be portable.

One thing which I did read up (however on the Web and not in the proposal itself), and what was a littel unexpected for me, is about the volatile keyword. The proposal didn't stengthen the volatile keyword in the way it's been done in Java or C#. I suppose that out of the reasons of backward compatibility with C's volatile remained only a compiler (dis)optimisation directive, while for the visibility guarantees between threads we are supposed to use the atomic<> library!

How will the old programs using volatile for visiblity between threads be running now? Are they broken? I suppose so, they probably must be rewritten using the atomic<> types.

5. Summing things up

My impression so far is, that there are many small things you wouldn't expect, and that they are fun! I was expecting the big new chunks of functionality to be added (and they were added indeed) to be the most impressive ones, but it's rather the small things which make the difference. Look at that code snippet in the new, cool layout style (which I picked up in this Ferbruary's issue of the MSDN Magazine) :

    for_each(values.begin(), values.end(), [](int& value)
    {
        // now we can nest STL at last! (or can we???)
        for_each(value.begin(), value.end, DoSomething);
    });

I'd dare to say that C++ seems to have put away its 90-ties image, as it allows to write programs more in style of the scripting languages than the old and venerable K&R C! I can only add "at last" and "Amen".

---
* Thanks Jens!
** Herb Sutter's announcement and some discussion with the inevitable C++ bashing: http://herbsutter.wordpress.com/2008/10/28/september-2008-iso-c-standards-meeting-the-draft-has-landed-and-a-new-convener/, the Draft document itself is here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2798.pdf
*** like the article in the Wikipedia, which I found to be surprisingly good: http://en.wikipedia.org/wiki/C%2B%2B0x
**** Hans Boehm: "Threads cannot be implemented as a library" Technical Report HPL-2004-209, HP Laboratories Palo Alto, 2004, http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf