Thursday, 26 May 2011

Bad Case of Code Reuse

Bug story!

We all strive to write elegant, concise and clear code. One of the tenants of this pursit is the "DRY" principle - don't repeat yourself! But as usual you shouldn't always follow a principle blindly. Take for example the following bug which caused me some problems at customer's premises.

I just extended the client library of the system I was in charge of at that time to suppport parallel requests, as the legacy server code would would be too risky to be refactored. Why? Well, not everyone has the luxury to have (or even sometimes to be able to define) a comprehensive set of unit tests. In case of the legcy product there were none, and worse, the sheer number of possible inputs (any satellite data, corrupted or not) does not allow the system to be wholy tested!

I decided to supervise long running tasks and send a short running ones in parallel, so the concurrency level would be bounded by 2.  At two points in the code I have got to check if the next task is a long-runner or not: when a single new task is submitted for processing, an when I need a second, parallel task (which cannot be a long runner again!). As I (true to the agile) first wrote the new task submission code, I defined the:
method, which looked up if the input data file could be a potential long-runner. This method was used to check if a newly arrived request should be run in parallet to the currently running one. Then the library was tested and the first version was finished.

In the next test iteration I added pulling of the requests from the input queue when a short-runner gets ready and the long-runner is still busy. So I had to check if a request sitting in the input queue were a short-runner. But wait, our requests are files, and we already have a method to check for it! DRY baby! Elegant code! I tested it locally, and then it was deployed to our customer on the other side of the globe, namely in Asia. Mission acomplished, no problems expected... You wish! Of course, in Asia the new client library didn't quite work as expected, and the problem that had to be corrected still showed up. Embarassement, client unhappy! After many calls and emails I had at last the logfiles with the error showing clearly up.

What was the problem? In the parallel task submitting code at the end of a short running request I DRY-reused the
method, I should rather use:
What's the difference? Well, in my client library there is one: the filenames waiting in the input queue will be processed and can be renamed in some specific cases. The isLongrunner() method will check the already processed filenames, as it's passed as paramether to the WorkerThread::process() method*. Unfortunately, in the second case, we are already in the process() method, and are looking directly into the input queue, where the raw file names are to be found! Bugger! And the local test worked, because..., well I can't remember the reason now, probably I didn't test the interface variant (there are two, legacy and modern one - it's a real, living and growing system) the asian client was using. Ooops :(

So what do we learn from this story?
  1. If I wasn't so obsessed with code elegance and DRY, I wouldn't be lured into bad reuse, and then the omission to test the right interface version wouldn't have such unpleasant consequences.
  2. If I'd care about the naming of the methods and its parameters to the point of obsession, I wouldn't settle for the first name which somehow filled the bill. It was sloppy!
 So the advice is as with the optimization: first get the code right, that get it DRY! And sometimes the techniques we are eager to use are more of hindrance than of help in a subtle way.

* WorkerThread is a part of my portable (as it's Qt-based) worker library, maybe I'll find the time to blog about it. At least the title is already chosen: "Are workers actors?".

No comments: