On Software and Languages: 2011

Thursday, 15 December 2011

DSLs not so pointless?

... or: busting myths and fallacies, this time - the DSL-s!

Until recently I'd only sneer at the idea of DSL's: what the @!%*, either we have a soild API for developers, or a specialized higher level language for the specific task. And the higher level language best shouldn't be done at all, because the non-technical user just wants a smooth GUI to click his requests together! So spare your DSL hype on me, go and find another gullible middle management person.

But then I read the following passage and the fallacy behind this reasoning became apparent to me:

In other environments I could write tests together with the product owner. Together, we could create and discuss tests that expressed what we wanted our code to do without getting buried in the details. The tests I wrote in C++ were quite unreadable for a non-C++ developer.

What was my error? DSL isn't meant as a programming language, neither for the developer, nor for the notechnical user - it is meant to document things!

Look at the following example of a testing DSL in C++:

  #include <igloo.h>  // testing framework*
  #include <cell.h>   // tested class
 
  using namespace igloo;

  Describe(A_Cell)
  {
      It(should_have_coordinates)
      {
          Cell cell(3,4);
          Assert::That(cell.getX(), Equals(3));
          Assert::That(cell.getY(), Equals(4));
      }

      It(can_detect_if_its_a_neighbour)
      {
          Cell cell(1,5);
          Cell neighb(2,4);

          Assert::That(cell.isNeighbourTo(neighb), IsTrue());
      }
  };

You must say it's a piece of C++ code** you could show to anyone and discuss it as a kind of formal spec. You and your partner don't have to learn Z or some other specialized notation. What both of you know already is just enough to bridge the gap. And it's cool code as well!

--
* see: http://igloo-testing.org/
** it's not a joke, all this is possible in C++

Friday, 11 November 2011

Geeky Halloween costumes...

Matching nicely my last post abot the Go language: dress up as a programming language mascot:

(as appeared in Golang blog).

Any other languages to follow? For Perl it's simple (a camel), for OCAML too (a supercharged camel), but what about C++? Or Assembler?

PS: My favourite C++ pic is this:

. But you really can't dress up like that, can you?

Tuesday, 25 October 2011

What I did like about Go.

Last week I've read the "Learning Go" book. I did that by sheer coincidence: it happened that I had the (free) book loaded on my eBook reader and just looked for light but interesting read. Why not an intro to a new programming language? So I started to with it and I didn't regret my decision. It was an entertaining and interesting lecture. I'd like to describe here what I liked about the language.

See, normally reviewers concetrate on what they didn't like in a language, but I think that's way too easy! The things we don't like are most probably the things we aren't accustomed with, and with some experience they wouldn't annoy us any more. So don't gripe, just be easy on Go!

As I learned from the book, Plan 9 project*, the (supposed to be) successor of Unix, included a language called Limbo which was a lot like Go already, which built on an earlier langauge called Newsqueak. So surprisingly, there's a lot of history showing up here!

Go itself is like it's mascot (look to the right!): small and quick. It's compiled and statically typed but does type deduction (like OCAML or Haskell). It main claim to fame are (beside garbage collection) goroutines and channels, but I have too little experience with them, so I'd rather concentrate on basic day to day traits of the language. I mean, the small things are the things that annoy us the most, and should a language will be anoying, it will screw them. Additionally, this way we can somehow transmit the "feel" of the language.

So let's start with:

1. multiple assignement

...a la Python:

  x, y := 11, 12
  _, y := 22, 23

As you can see the semicolons are not needed too - gives you a nice Python / F# / Clojure feeling!
If you have mutliple assignament, you get multiple return values for free, as in the following function**:

  func Write(file *File, b []byte)(int, Error)

See? You just specify the returns in parens after the parameters. And that leads us to the next goodie connected to the multiple assignment:

1a. the ok-idiom

it goes like that:

  if ok, x = foobar(y); ok {
    // do sth
    ...
  }

So no more if(foo() == -1)! Instead you can write:

  if nr, err = write(f, buf); err == nil {
    // do sth
    ...
  }

You can assign and test a value inside of the "if "clause. Additionally the parens can be omitted too with "if" - more Python / F# feeling! Nice, isnt it?

2. named return parameters

We can assign names the return parameters:

  func write(file *File, b []byte)(n int, err Error){
    ...
    
    return
  }

When we name the returns, the compiler will initialize them with defaults (i.e. their zero-values)! We can eithet refer to them inside of the function (!!) or even omit them from the return statement, and then they will be used automatically with the initialized values. One thing less to worry about.

3. maps (and strings)

They are part of the language and not in the library. I wish they'll be that way in C++ too. Come on, D has done that!. Add this to the {} initalization syntax and you can write (almost LISP, Python or Perl style)***:

  var opmap = map[int]string{
    ADD: ”+”,
    SUB: ”-”,
    MUL: ”*”,
    DIV: ”/”,
  }

Well, speaking frankly, the {} initialization is possible in C++ too, alas only as of C++11.

4. defer for destructors

You can define Deferred actions, i.e. actions which are executed on the end of scope only:

  file.Open(”filename.in”)
  defer file.Close()

It's a nice replacement for destructors (see my previous post here). And it looks better that some "using" constructs from other languages.

5. duck typing

Go has duck typing, but it's not C++ type of template duck typing. You don't have to mention a specific interface (yes, Go has a notion of interfaces!), it suffices to implement them ***:

  type I interface {
    Get() int
    Put(int)
  }

And then you just implement the required functions for a data type (see note ** below), and it will automaticall comply with the interface I:

  type S struct { i int }
  func (p *S) Get() int { return p.i }
  func (p *S) Put(v int) { p.i = v }

like that:

  func f(p I) { p.Put(1) }
  var s S 
  f(&s)

OK, so whats the difference to C++ type duck typing? C++ doesn't care about types: a template with two parameters simply takes two things, and then tests if they have the methods the template code requires:

  template <type T> void foo(T& t)
  {
    if(t.canBark()) t.bark();
  }

How's that different from Go? C++ gets problems with overloading in that context: it sees only a function that takes 1 thing. A totally generic thing! So you cannot define a second template with the same name and a different argument type, because at the definition level there aren't types, only things! That's why there's no range or complete container overloads for SLT algorithms (see H.Sutter's blogpost).

6. generic switch

Just like Groovy, Go has a very flexible switch statement:

  switch c {
    case ’ ’, ’?’, ’&’, ’=’, ’#’, ’+’:   
      return true
    case c < 'a': fallthrough
    case c > 'A':
      return true
    default:
      return false
 }

I mean, ever language needs something like that: case switch for strings and other types! And did you notice that fallthrogh must be stated explicitly? I like that.

7. the go keyword.

I liked the idea of introducing the word "go" as keyword to start a parallel "goroutine":

  int i
  go foobar(i)

I just find it funny :), that's all...

Note:
I admit, I didn't describe the features that make Go to stand out, only a couple of those which particularly appealed to me. For more info on: function definitions, val notation, control structures, function literals (i.e. lambdas), goroutines, etc please read the book by yourself!

---
* a note aside: as pointed out several times on the web, there's a notable similarity between the logos of Plan 9 and Go. BTW: don't you think they are cute?

** more precise, you'd rather define a function like this:

  func (file *File) Write(b []byte)(int, Error)

i.e. a function working on the File type, and invoke it like:

  nr, err = f.Write(buf)

But this is the part of the OO stuff so I skipped it in this post.

*** code example from the "Learning Go" book.

Sunday, 26 June 2011

2009 Languages Overload

This year (or more to the point, that year, i.e. 2009) was for me a year of languages. Or maybe better, of languages overload! While I'm earning my bacon mainly with C++ (and a bit of Python), this year I had a look at a couple of new languages (adding to my last year's Groovy adventures) like: C# 3.0, Go, Scala, Clojure and even a bit of Haskell. In case of Haskell I'm feeling rather guilty, as I wanted to keep myself busy with Haskell this year but didn't manage more than some superficial reads about monads, and only becaue I needed to explain some Clojure code! Shame!

1. Java

But let me first comment on another language, which is ly dying these days: Java. Is it really dying? Looking at this post by codemonkeyism you'd say so. Why don't we like Java anymore? Let me cite:

Inheritance: outside of frameworks, inheritance is inflexible, leads to tight
coupling and is plain bad. Composition is most often a better choice. As are

mixins and traits

Noisy syntax: Lately there has been the enlightenment that too much noise in a language is a bad thing. Java is especially noisy in closures (anonymous inner classes) and generics.

Null / NPE: Null as the default for object references was a billion dollar mistake. An object should by default need a value. Otherwise NPEs will proliferate through your code. Newer languages prevent nulls or make the null behavior the non default one

List processing: As shown by functional languages, list processing should not be done
in loops. ... Java should have native support for easy list processing, not via the – best we have – constructs in Google Collections.

For me the 2nd problem is the worst, i.e. the noise. It's appalling how much of the Java boilerplate can be thrown away - check this presentation to see how much can be done with Groovy! But each of the new languages does a pretty good job here. Well, to be true, my biggest problem wasn't stated above at all mentionae above - lack of operator overloading! Whatever, the fact is that there's a host of new languages out thre, and the times seem to be more exciting than ever since the 70-ties!

2. Scala

Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition

Scala seems to be a language having every single feature from any language known to man. The overall impression was: OK, I know that one from ... One single feature I liked is the solution to the multiple inheritance conundrum (you know, the Deadly Diamond of Death) by linearization of the base classes. Clean, efficient, impressive! From the above list it's solving the Null/NPE, noise and inheritance problems. But the syntax is ML derived! Hate it! if this is a Java replacement, why cannot we stay in the old good C-lands? Another problem with Scala is poor tool support: I wasn't able to get the Eclipse plugin running, despite applying various workarounds found on Scala's homepage and upgrades of both the platform and the plugin. Even Martin Odersky (Scala's creator) is using Emacs for work!

On the other side, I think Groovy solves the list processing more elegantly, and it's DSL capabilites are more convincing too! (example to come)

3. Clojure

Programming Clojure (Pragmatic Programmers)

I like Clojure! I don't know why. It's not typesafe, has Lisp-y syntax and very complicated mechanism for integration of Java class libraries, but it is somehow elegant and minimalistic. It includes mutithreading primitives in the language definition and it tries to achieve good performance while staying immutable!

As I watched this presentation by Rich Hickey I noticed one important thing: he's an architect who was facing design problems in complex real life systems* (somehow like me, although I must admit that he's been exposed to more complexity than me) an tried to find a solution. And he really knows his stuff - all the conventional wisdom of software architecture! And Clojure (and Clojure's features) is an answer to these problems: it takes from LISP what seems suitable and isn't afraid to change it** otherwise.

Contrast this with Scala - sometimes feeling like an academic exercise in trying implementing all the features known to man in a single programming language just to show that it can be done (caveat: totally subjective opinion here). Maybe this pragmatic legacy is one of reasons why I like Clojure much more then Scala.

4. F# (and C#)

C# 3.0 Unleashed: With the .NET Framework 3.5

C# seems to be evolving very rapidly. From the ugly duckling it was a couple of years ago, it developed a respectable language with a decent amount of innovation. It has good lambda function support and the LINQ concept is rather interesting. I was surprised!

F# in its turn seems to be syntax-wise another ML derived langage like Scala, but it's not so overloaded with features. Sorry, I'm still not done with the free F# book...

Some (for example Scala Lift's creator here) say that it could be the single one of functional languages to success in the mainstream!

5. Go-lang

I didn't like it that much despite of its heritage. The obligatory, one and only true formatting rules, lack of subclassing... But on the other side, there is a CSP implementation on the language level! And as another programmer, who actually tried it out, stated here, it's much like writing in Python, only without its excruciating slowness. So maybe I should switch from Python to Go? Google Application Engine supports it now!

Update: in the meanwhile I managed to have a look at Go!

6. Summary

Warning: You see, it's not the comprehensive comparision I wanted to write back in 2009/10, but before I ditch the project altogether, I'd rather write this stub and extend it when I find some time. But the summary is already complete now, so... ---->

As I said above, I'd use Scala as a "better Java" as to avoid Java verbosity, lack of operator overloading and multiple inheritance. I'd use Closure for the joy of if (pun intended), I'd like some of my clients to want me to use Go, but wouldn't propose it myself, and I probably won't use C# unless I switch to .NET which is rather unlikely. On the other side, I might install F# on my PC - just to play with it a little.

--
* according to InfoQ "...Rich has worked on scheduling systems, broadcast automation, audio analysis and fingerprinting, database design, yield management, exit poll systems, and machine listening."

** e.g. even the venerable parentheses!

Exegi Monumentum...

"Exegi monumentum aeris perennis"* - I don't know why but this line is always coming to my mind when I'm thinking about my father, who passed away more than 2 years ago. Just because he was fond of classical studies and Latin?

He also cherished that other lines of poetry, in a language that you can't possibly know, but which, as I can assure you, are work of a genius (I cite form memory):

"Wsrod takich pol przed laty, nad brzegiem ruczaju,
na pagorku niewiekim, we brzozowym gaju,
stal dwor szlachecki, z drewna, lecz podmurowany,
swiecly sie z daleka jego biale sciany,
tym bielsze, ze odbite od ciemnej zieleni,
topoli, co go strzegly od wiatrow jesieni."

You're right, that's a single sentence. And he could recite one after another in the endless flow of the language. Yes, you are right again, this is a hommage to my late father, which I'm writing 2 years too late. The first thing I'll always associate with him will be the classical poetry. The another one, linked via the above phrase of Horace, is a story of an interview and things which it revealed to me in retrospect.

It was my school appointment to interview someone and I choose to interview my father. I remember just a single question I asked: what do you want to be remembered for? The answer was somehow unexpected to me: he cited the line of Horace, and said: I want to be remembered for the things I accomplish. And he did, he was working in local political bodies and was involved in building roads and water supply lines in our rural neighbourhood.

So when it came to finding a vocation, I wanted to build things! Not maths, physics, literature or music (things I was interested in), but engineering!

As for us in life there are 2 things which are worth of pursuing: for yourself, trying to understand the great scheme of things in that world (for practically minded, if you are more ambitious it's the universe then). But secondly, you owe it to the others, those before you and those to come after you, to keep the world running smoothly, and build new smootly running things to replace old ones.

I think, that's my father's heritage. And that's why I cannot be a geek - because I grew up listening to poetry, looking at nature of the rural coutryside and experiencing very practical achievements in very real world.

--
* fom Horace, translation mine: "I build a monument [to stay] for perennial ages"

Monday, 6 June 2011

Agile Rant Update

After you've read my Agile rant you may have though: "just another old-fashioned hacker who just doesn't get it...". Fair enough, I'm myself not that sure if that what I'm saying isn't just result of lack of experience with Agile? Maybe with even more exposure all my reservations would vanish?

So I'm happy to find out that there are other people thinking roughly along my lines. For example here: Agile Hokum, in DDJ reader mail. He (the reader)says exaclty what I said:

But the "TDD corrupted by writing the tests after coding" is about as extreme as the Taliban. Testing is good. Write them before, during, after, much after, etc., all have their place.

and doubts usefulness pair programming as well. The standard TDD answer of the expert goes than like that:

I agree that tests written at anytime will generally have value. However, TDD was not designed as a way of validating code — plain unit testing will do that. TDD is first and foremost a design practice. And so doing TDD by writing the tests after the code is simply a misunderstanding of what you’re actually doing.

Well, that opinon is already dealt with in my Agile rant. But taking the simplistic road I'd just say that's the way it is handled in agile projects: not as a design technique but rather as a testing one. OK, here seems to be some confusion on the term itself: it's either Test Driven Development or Design...

On the other side the author (i.e. the reader) openly admits that he's not an Agile practitioner, so that opinion doesn't give any earnest support for my criticism?.

But look what an ex Googler says here:

This is not to say that all unit testing is bad--of course not. The dispatch logic in Guice Servlet and Sitebricks benefit from rigorous, modular unit tests. If you have ever used Gmail, Blogger, Google Apps, AdWords or Google Wave (all use Guice Servlet to dispatch requests) you have seen the benefits of this rigor first-hand. But take it from me, we could have achieved the same level of confidence with a few well written unit tests and a battery of integration tests. And we'd have been in a much better position to improve the framework and add features quickly.

That strikes a chord with me, because it's exactly the way I tested my last system - a lot of integration and pre-integration tests, and a couple of unit tests for classes which couldn't be easily tested thorougly that way. Wouldn't that be allowed at your company? Because this isn't Agile?

Saturday, 28 May 2011

Don't Decay your Arrays!

OK, that's nothing new, the technique is old, will be obsolete soon with the new std::tr1::array, and why should we use naked arrays anyhow? But recently I couldn't write it down as easily as I would like to, and as it's useful in the pre C++2011 code, why not discuss it shortly here?

Imagine you have to implement a Netbios/SMB remote file access protocol. The problem we are faced with while doing this is trivial but important - we need to log different packet types, operation codes and command options for debugging. They are all defined as integer type values and C++ doesn't (nor does any language I know) offer much support in that respect. So we define our own mini framework using macros. Don't panic, macros aren't that bad as they are told, and in our use case they are a perfect fit: translating between code (integer value) and strings (its descriptions). It's like eval() but in the other direction :). And it is working like this:

  // trace helper tables
  #define DEF_DESCRIPTION_ROW(a) {a, #a}
  struct CmdNameEntry { int cmdId; string cmdName; };

  const CmdNameEntry g_cmdNameTableNetbios[] = {
      DEF_DESCRIPTION_ROW(SESS_MESSAGE),
      DEF_DESCRIPTION_ROW(SESS_REQUEST),                             
      ...
  };
  const CmdNameEntry g_cmdNameTableSmb[] = { 
      ...
  };
  const CmdNameEntry g_cmdNameTableTrans[] = { 
      ...
  };
  const CmdNameEntry g_cmdNameTableTrans2[] = {                 
      ...
  };

Got it? We use C's stringize operator (#) to convert a constant's name to a string in compile time. You see, we need quite a couple of tables, and each of the tables has many entries (you have to believe me, Netbios & Co are rather a chatty bunch). So why are we using naked arrays? Because the std::vector class didn't have a convenient initializer before C++2011*, a shame! Alternatively you could copy the array's contents to a fresh vector just to pass it to futher processing, but how ugly is that?

Now you'd need to write something like this to translate between integer codes and their descriptions:

  getCmd(g_cmdNameTableSmb, sizeof(g_cmdNameTableSmb)/sizeof(CmdNameEntry)),

because in C/C++ you cannot define a true function on arrays like this:

  getCmd(CmdNameEntry table[N]).

Well, it seems you can after all, but it's of no use anyway because (as you know) inside of the function the array will decay to a pointer and the size information will be lost. I don't know how other people may feel about it, but after some time I got pretty annoyed with that. Couldn't we wrap this in a macro, or even better, use some template trickery? Let's try and write a small utility:

  template<typename T, size_t N>
    string SmbProtocol::getCmdName(int code, T(&table)[N], const char* errorStr)
  {
      for(size_t i = 0; i < N); i++) // <- oops!!! compile error!

      {
          if(table[i].cmdId == code)
              return table[i].cmdName;
      }    
      return errorStr;
  }

Uh-oh, that doesn't compile, but frankly, how should it? N isn't a runtime construct, it is type information, and it lives only at the compile time! What we need is something to translate between type and value, i.e. a bridge between compile and run time. It appears that's not so complicated as it sounds, just write**:

  namespace util
  {
  template<typename T, size_t N>
    size_t array_size(T(&)[N]) { return N; }      
  }

As we are in a template, we can access the type information, and as we are a function, we can return a value. Bridge ready! Now we only need to replace the incorrect line with:

  for(size_t i = 0; i < util::array_size(table); i++)

and now we are ready to go:

  string SmbProtocol::getNetbiosCmdName(int code)
  {
      return getCmdName(code, g_cmdNameTableNetbios, "UNKNOWN_SESS_MSG");
  }

Now we can define new description tables, forward them to functions using the pair of T(&table)[N]and util::array_size() and let our pre C++2011 code look a little more elegant.
--
* You can have a look at this previous installment of this blog for the new C++0x initialization syntax .
** if you wonder about the array parameter syntax look here for an explanation: template-parameter-deduction-from-array-dimensions

Thursday, 26 May 2011

Bad Case of Code Reuse

Bug story!

We all strive to write elegant, concise and clear code. One of the tenants of this pursit is the "DRY" principle - don't repeat yourself! But as usual you shouldn't always follow a principle blindly. Take for example the following bug which caused me some problems at customer's premises.

I just extended the client library of the system I was in charge of at that time to suppport parallel requests, as the legacy server code would would be too risky to be refactored. Why? Well, not everyone has the luxury to have (or even sometimes to be able to define) a comprehensive set of unit tests. In case of the legcy product there were none, and worse, the sheer number of possible inputs (any satellite data, corrupted or not) does not allow the system to be wholy tested!

I decided to supervise long running tasks and send a short running ones in parallel, so the concurrency level would be bounded by 2. At two points in the code I have got to check if the next task is a long-runner or not: when a single new task is submitted for processing, an when I need a second, parallel task (which cannot be a long runner again!). As I (true to the agile) first wrote the new task submission code, I defined the:

isLongrunner(filename)

method, which looked up if the input data file could be a potential long-runner. This method was used to check if a newly arrived request should be run in parallet to the currently running one. Then the library was tested and the first version was finished.

In the next test iteration I added pulling of the requests from the input queue when a short-runner gets ready and the long-runner is still busy. So I had to check if a request sitting in the input queue were a short-runner. But wait, our requests are files, and we already have a method to check for it! DRY baby! Elegant code! I tested it locally, and then it was deployed to our customer on the other side of the globe, namely in Asia. Mission acomplished, no problems expected... You wish! Of course, in Asia the new client library didn't quite work as expected, and the problem that had to be corrected still showed up. Embarassement, client unhappy! After many calls and emails I had at last the logfiles with the error showing clearly up.

What was the problem? In the parallel task submitting code at the end of a short running request I DRY-reused the

isLongrunner(filename)

method, I should rather use:

isLongrunnerSession(sessFilename)

What's the difference? Well, in my client library there is one: the filenames waiting in the input queue will be processed and can be renamed in some specific cases. The isLongrunner() method will check the already processed filenames, as it's passed as paramether to the WorkerThread::process() method*. Unfortunately, in the second case, we are already in the process() method, and are looking directly into the input queue, where the raw file names are to be found! Bugger! And the local test worked, because..., well I can't remember the reason now, probably I didn't test the interface variant (there are two, legacy and modern one - it's a real, living and growing system) the asian client was using. Ooops :(

So what do we learn from this story?

If I wasn't so obsessed with code elegance and DRY, I wouldn't be lured into bad reuse, and then the omission to test the right interface version wouldn't have such unpleasant consequences.
If I'd care about the naming of the methods and its parameters to the point of obsession, I wouldn't settle for the first name which somehow filled the bill. It was sloppy!

So the advice is as with the optimization: first get the code right, that get it DRY! And sometimes the techniques we are eager to use are more of hindrance than of help in a subtle way.

--
* WorkerThread is a part of my portable (as it's Qt-based) worker library, maybe I'll find the time to blog about it. At least the title is already chosen: "Are workers actors?".