Their (i.e. Google's) achievements quite impressed me: first the whole infrastructure (lot of C++ coding - they simple wrote their own filesystem and database!!!), but equally the level of abstraction used when working with distributed data. I wrote about the superiority of that model over the SOA model before, although the SOA model is probably the best you can achieve in a heterogenous environment (and I was probably wrong there...).
So every time I see a map-reduce implementation I can't help reading about it: Hadoop is the most known open source implementation, but there's the QtConcurrent::mappedReduced algorithm in the new Qt 4.5* as well. You see, the idea seems to be catching on.
Now to the news: there is a Map-Reduce implementation in Erlang (!!!)** which runs Python scripts (!!!) and it's called Disco***! And if you don't have a massive parallel cluster at home, you can run it in the Amazon's Elastic Computing Cloud! I don't like Nokia very much, but I must admit that this one is rather cool: you simply write scripts to manipulate your data, much in the vein of UNIX shell programming, only infinitely scalable! And we know that scripting languages are much better for data manipulations than Java or C++. According to its homepage, Disco is quite a success too:
This far Disco has been succesfully used, for instance, in parsing and reformatting data, data clustering, probabilistic modelling, data mining, full-text indexing, and log analysis with hundreds of gigabytes of real-world data. ***Wow! I like te idea of Erlang and Python working unisono!
---
* Qt 4.5 docs: http://doc.trolltech.com/main-snapshot/threads.html#qtconcurrent
** a small itroduction to Erlang: http://ib-krajewski.blogspot.com/2007/08/erlangs-change-of-fortunes.html
*** Disco's homepage: http://discoproject.org/