Saturday, 28 May 2011

Don't Decay your Arrays!

OK, that's nothing new, the technique is old, will be obsolete soon with the new std::tr1::array, and why should we use naked arrays anyhow? But recently I couldn't write it down as easily as I would like to, and as it's useful in the pre C++2011 code, why not discuss it shortly here?

Imagine you have to implement a Netbios/SMB remote file access protocol. The problem we are faced with while doing this is trivial but important - we need to log different packet types, operation codes and command options for debugging. They are all defined as integer type values and C++ doesn't (nor does any language I know) offer much support in that respect. So we define our own mini framework using macros. Don't panic, macros aren't that bad as they are told, and in our use case they are a perfect fit: translating between code (integer value) and strings (its descriptions). It's like eval() but in the other direction :). And it is working like this:
  // trace helper tables
  #define DEF_DESCRIPTION_ROW(a) {a, #a}
  struct CmdNameEntry { int cmdId; string cmdName; };

  const CmdNameEntry g_cmdNameTableNetbios[] = {
  const CmdNameEntry g_cmdNameTableSmb[] = { 
  const CmdNameEntry g_cmdNameTableTrans[] = { 
  const CmdNameEntry g_cmdNameTableTrans2[] = {                 
Got it? We use C's stringize operator (#) to convert a constant's name to a string in compile time. You see, we need quite a couple of tables, and each of the tables has many entries (you have to believe me, Netbios & Co are rather a chatty bunch). So why are we using naked arrays? Because the std::vector class didn't have a convenient initializer before C++2011*, a shame! Alternatively you could copy the array's contents to a fresh vector just to pass it to futher processing, but how ugly is that?

Now you'd need to write something like this to translate between integer codes and their descriptions:
  getCmd(g_cmdNameTableSmb, sizeof(g_cmdNameTableSmb)/sizeof(CmdNameEntry)),
because in C/C++ you cannot define a true function on arrays like this:
  getCmd(CmdNameEntry table[N]).
Well, it seems you can after all, but it's of no use anyway because (as you know) inside of the function the array will decay to a pointer and the size information will be lost. I don't know how other people may feel about it, but after some time I got pretty annoyed with that. Couldn't we wrap this in a macro, or even better, use some template trickery? Let's try and write a small utility:
  template<typename T, size_t N>
    string SmbProtocol::getCmdName(int code, T(&table)[N], const char* errorStr)
      for(size_t i = 0; i < N); i++) // <- oops!!! compile error!
          if(table[i].cmdId == code)
              return table[i].cmdName;
      return errorStr;
Uh-oh, that doesn't compile, but frankly, how should it? N isn't a runtime construct, it is type information, and it lives only at the compile time! What we need is something to translate between type and value, i.e. a bridge between compile and run time. It appears that's not so complicated as it sounds, just write**:
  namespace util
  template<typename T, size_t N>
    size_t array_size(T(&)[N]) { return N; }      
As we are in a template, we can access the type information, and as we are a function, we can return a value. Bridge ready! Now we only need to replace the incorrect line with:
  for(size_t i = 0; i < util::array_size(table); i++)
and now we are ready to go:
  string SmbProtocol::getNetbiosCmdName(int code)
      return getCmdName(code, g_cmdNameTableNetbios, "UNKNOWN_SESS_MSG");
Now we can define new description tables, forward them to functions using the pair of T(&table)[N]and util::array_size() and let our pre C++2011 code look a little more elegant.
* You can have a look at this previous installment of this blog for the new C++0x initialization syntax .
** if you wonder about the array parameter syntax look here for an explanation: template-parameter-deduction-from-array-dimensions

Thursday, 26 May 2011

Bad Case of Code Reuse

Bug story!

We all strive to write elegant, concise and clear code. One of the tenants of this pursit is the "DRY" principle - don't repeat yourself! But as usual you shouldn't always follow a principle blindly. Take for example the following bug which caused me some problems at customer's premises.

I just extended the client library of the system I was in charge of at that time to suppport parallel requests, as the legacy server code would would be too risky to be refactored. Why? Well, not everyone has the luxury to have (or even sometimes to be able to define) a comprehensive set of unit tests. In case of the legcy product there were none, and worse, the sheer number of possible inputs (any satellite data, corrupted or not) does not allow the system to be wholy tested!

I decided to supervise long running tasks and send a short running ones in parallel, so the concurrency level would be bounded by 2.  At two points in the code I have got to check if the next task is a long-runner or not: when a single new task is submitted for processing, an when I need a second, parallel task (which cannot be a long runner again!). As I (true to the agile) first wrote the new task submission code, I defined the:
method, which looked up if the input data file could be a potential long-runner. This method was used to check if a newly arrived request should be run in parallet to the currently running one. Then the library was tested and the first version was finished.

In the next test iteration I added pulling of the requests from the input queue when a short-runner gets ready and the long-runner is still busy. So I had to check if a request sitting in the input queue were a short-runner. But wait, our requests are files, and we already have a method to check for it! DRY baby! Elegant code! I tested it locally, and then it was deployed to our customer on the other side of the globe, namely in Asia. Mission acomplished, no problems expected... You wish! Of course, in Asia the new client library didn't quite work as expected, and the problem that had to be corrected still showed up. Embarassement, client unhappy! After many calls and emails I had at last the logfiles with the error showing clearly up.

What was the problem? In the parallel task submitting code at the end of a short running request I DRY-reused the
method, I should rather use:
What's the difference? Well, in my client library there is one: the filenames waiting in the input queue will be processed and can be renamed in some specific cases. The isLongrunner() method will check the already processed filenames, as it's passed as paramether to the WorkerThread::process() method*. Unfortunately, in the second case, we are already in the process() method, and are looking directly into the input queue, where the raw file names are to be found! Bugger! And the local test worked, because..., well I can't remember the reason now, probably I didn't test the interface variant (there are two, legacy and modern one - it's a real, living and growing system) the asian client was using. Ooops :(

So what do we learn from this story?
  1. If I wasn't so obsessed with code elegance and DRY, I wouldn't be lured into bad reuse, and then the omission to test the right interface version wouldn't have such unpleasant consequences.
  2. If I'd care about the naming of the methods and its parameters to the point of obsession, I wouldn't settle for the first name which somehow filled the bill. It was sloppy!
 So the advice is as with the optimization: first get the code right, that get it DRY! And sometimes the techniques we are eager to use are more of hindrance than of help in a subtle way.

* WorkerThread is a part of my portable (as it's Qt-based) worker library, maybe I'll find the time to blog about it. At least the title is already chosen: "Are workers actors?".