Monday 14 June 2010

Javascript on the server?

Recently, I stumbled upon an iteresting presenation about server I/O parallelism and multithreading. Surprisingly it was in a Javascript conference, as I sometimes have a look at the Javascript language developments. I'd never expect to find something like that in this contex though.

The general message of the presentation was that threads suck and asynchronous programing is the king. Come again? I don't quite agree with that (or rather agreed, as you'll see in brief) - wasn't that for the threads to save us from the onerous asynchrounous programming with its context information and callbacks (remember Windows event programming?)! The threads justly delivered a convenient abstraction to group the logically connected data and logic in separate units of execution. So now we are doing a full circle: from asynchronous to threads to asynchronous?

OK, on the surface that may be looking like that, but there's more to that. The problem is that when we apply threads to parallel I/O they are just to slow (you know, context switching, thread stampedes and all that). Thus there's a host of asynchronous I/O methods starting with the venerable select() UNIX call. Ok, agreed, for that special field the threading model isn't very appealing, but for the general usage I'd just hide this inside of an I/O thread...

Now there's this pesentation (Ryan Dahl’s talk on Node.js) and what the author says is exactly that: callbacks and asynchronicity are good! How that? We all know it's a pain in practice? His response (an this is the valuable insight got when watching this) is that we just didn't have a good enough programming language to realise that! When you do in in C (as you'd likely to do when programming performant I/O) you are messed up, but if you take Javascript...

See, that's the crux here: Javascript was designed to work in a callback driven environment, so the language has unique mechanisms which no other language can offer. Thus if we take Javascript and do event oriented programming it's a breeze then! The idea is to take the ultra higspeed C library (libev in that case) and wrap it with event-friendly Javascript hull. Here's an example code from node.js site:
  var net = require('net');
net.createServer(function (socket) {
socket.setEncoding("utf8");
socket.addListener("connect", function () {
socket.write("Echo server\r\n");
});
socket.addListener("data", function (data) {
socket.write(data);
});
socket.addListener("end", function () {
socket.end();
});
}).listen(8124, "127.0.0.1");

Well, maybe it isn't a thing of beauty (if you're not into Javascript) but it's concise, everything is at one place, and you can read it without reading tons of manuals first. So maybe this is the way of doing this? Another thing I liked that node.js never provides a blocking API - even filesystem calls are asynchronous! That's consequence!

Sorry for the short post, but it's summer, I want to go out and watch some worldcup!

---
Resources:
1. the talk to be found here: http://jsconf.eu/2009/video_nodejs_by_ryan_dahl.html
2. and here's the code: https://gist.github.com/a3d0bbbff196af633995
3. if you'd like some more technical details, here's an excellent post by Simon Wilson: http://simonwillison.net/2009/Nov/23/node/
4. node.js critique (yes, there's such a thing too!): http://al3x.net/2010/07/27/node.html

Addendum:
- here is a simple blog engine in less than 200 lines of code using node.js : http://github.com/zefhemel/persistencejs/blob/master/test/node-blog.js. Not bad, I'd say.
- Multi-node: how to implement a concurrent node.js HTTP server: http://www.sitepen.com/blog/2010/07/14/multi-node-concurrent-nodejs-http-server/
- Hummingbird: an impressive application based on node.js - http://mnutt.github.com/hummingbird/


Sunday 13 June 2010

Software estimation and re-reading the classics

Recently I stumbled upon the "Software Top-Ten List"* again. At first I though, OK, that's old, they did mainframes at that time! But then I read it nonetheless. Maybe because it's so short? It's only 3 pages. Whatever. I read it I was surprised about how refreshing it was. Not mouldy, stale and irrelevent, but interesting and fresh! Surprisingly, because the findings are supposed to be well known.

One interesting thought came after I read the point 8:
Software systems and software products each typically cost 3 times as much per instruction to fully develop as does an individual software program.
I immediately thought about that old estimation rule: estimate how long it will take to be done, and then multiply it by two. I always though it to be unserious or a manifestation of intellectual laziness. What, you aren't able to say how long this effing program will take to be written? Just sum up the parts (as shown below)!

But let's look at the problem from a little more different angle: what if programmers do not take into account the costs of polishing the program, making it user-friendly, plus all the problems and dependencies from other programmers located in other deparments of the company - "L'enfer, c'est les autres"! Normally you do not take this factors into account when estimating, and you even cannot estimate it properly. We use either a brute-force 20% project buffers or a statistical estimate based on the PERT method, but it somehow doesn't work out that good.

So maybe we should take the above citied measurement data, at use it when estimating? Just multiply our technical-only estimate by the "real world" factor? Personally, I still cannot believe that the "real world" factor be so high as 3! Am I too optimistic? But I think that frw=2 should be a good choice!

Although I'm a little afraid here, as I always imprement a given piece of code faster than my own PERT estimate. But recently, when estimating the entire system, I missed the point completely: new requirements, unexpected dependencies on not so great libraries, the internal pressure to please the customer... This all sums up in a greater project! An at that point I should have been to muliply so high as by 4! Welcome in the real world.

--
* Harry W. Boehm, "Industrial Software Metrics: A Top-Ten List",
http://csse.usc.edu/csse/TECHRPTS/1985/usccse85-499/usccse85-499.pdf