Saturday 25 October 2008

Erlang and Map-Reduce

I was a fan of Google's Map-Reduce for a quite long time, as I was first doing my PhD research in distributed systems and then was working in some high-availability projects, so such interest emerged somehow naturally.

Their (i.e. Google's) achievements quite impressed me: first the whole infrastructure (lot of C++ coding - they simple wrote their own filesystem and database!!!), but equally the level of abstraction used when working with distributed data. I wrote about the superiority of that model over the SOA model before, although the SOA model is probably the best you can achieve in a heterogenous environment (and I was probably wrong there...).

So every time I see a map-reduce implementation I can't help reading about it: Hadoop is the most known open source implementation, but there's the QtConcurrent::mappedReduced algorithm in the new Qt 4.5* as well. You see, the idea seems to be catching on.

Now to the news: there is a Map-Reduce implementation in Erlang (!!!)** which runs Python scripts (!!!) and it's called Disco***! And if you don't have a massive parallel cluster at home, you can run it in the Amazon's Elastic Computing Cloud! I don't like Nokia very much, but I must admit that this one is rather cool: you simply write scripts to manipulate your data, much in the vein of UNIX shell programming, only infinitely scalable! And we know that scripting languages are much better for data manipulations than Java or C++. According to its homepage, Disco is quite a success too:

This far Disco has been succesfully used, for instance, in parsing and reformatting data, data clustering, probabilistic modelling, data mining, full-text indexing, and log analysis with hundreds of gigabytes of real-world data. ***
Wow! I like te idea of Erlang and Python working unisono!

---
* Qt 4.5 docs: http://doc.trolltech.com/main-snapshot/threads.html#qtconcurrent
** a small itroduction to Erlang: http://ib-krajewski.blogspot.com/2007/08/erlangs-change-of-fortunes.html
*** Disco's homepage: http://discoproject.org/

Tuesday 7 October 2008

C++ servlets - again

It looks like it's getting to be a new hobby of mine: collecting C++ web application frameworks. After the first one* (a simple HTTP server and session classes) and the modern one (Wt aka witty)**, now the "missing link" was found: the classic Java-like Servlet container implementation in C++! This was made possible by a friendly fellow blogger Eduardo Zea.

Eduado was kind enough to give me a link to the DDJ article describing such an implementation by Rogue Wave named Bobcat***. It's quite old (by SW-industry standards) and the link to evaluation downloads doesn't work anymore, so I think, it didn't quite catch on. But it's another one in my collection! So the actual counters are C++=3, Java=googol.

PS: To be more precise, Bobcat functionality is now part of the Hydra Express****, a Rogue Wave's SOA publishing framework. So are we all going SOAP?

---
* see: http://ib-krajewski.blogspot.com/2007/09/servlets-in-c.html
** The Wt-framework: http://www.webtoolkit.eu/wt/
*** John Hinke, Implementing C++ Servlet Containers, April 01, 2002: http://www.ddj.com/184405023
**** http://www.roguewave.com/blog/so-what-is-it-with-rogue-wave-and-xml-soa/