Wednesday, 29 April 2009

From DLL- to XML-hell

This is one of the original articles I wanted to write when I started this blog. Well, it's taken some time, it's a little outdated now, but at last I've kept my promise (or rather one of them) !

1. The Lament


As I switched from C++ to Java in some previous project I thought I plunged directly from the DLL- to the XML-hell! Well, not exacltly (as I didn't programm under Windows in the preceeding project) but a little hyperbole makes a good start! The XML part of the title however, is true. The very first thing I noticed, was the ubiquitous XML - you had to use it for compilation (Ant scripts), you had to use it for for deployment (web.xml), you had to use it for your application (struts2.xml, spring.xml, log4j.xml).

The most annoying part of this was the change from makefiles to Ant scripts: so much superfluous verbiage! Look, compare the Ant script with the equivalent Ant-builder script in Groovy, taken from the "Groovy in Action" book, first the XML file:

<project name="prepareBookDirs" default="copy">
<property name="target.dir" value="target"/>
<property name="chapters.dir" value="chapters"/>
<target name="copy">
<delete dir="${target.dir}" />
<copy todir="${target.dir}">
<fileset dir="${chapters.dir}"
includes="*.doc"
excludes="~*" />
</copy>
</target>
</project>
Then the builder script:
  TARGET_DIR = 'target'
CHAPTERS_DIR = 'chapters'
  ant = new AntBuilder()
ant.delete(dir:TARGET_DIR)
ant.copy(todir:TARGET_DIR){
fileset(dir:CHAPTERS_DIR, includes:'*.doc', excludes:'~*')
}

Well, that's at least readable (and writeable), and it states exactly what you want! When I see such a comparison I think that the XML code should be used only as Ant-internal, assembly-code level data representation! You could maybe find it interesting that even the creator of Ant himself admitted that using XML was "probably an error"*.

The second problem was that, (as another programmer put it):

Developers from other languages often find Java's over-reliance on XML configuration annoying. We use so much configuration outside of the language because configuration in Java is painful and tedious. We do configuration in XML rather than properties because...well, because overuse of XML in Java is a fad.**

I can only confirm this. You can get a feel of how much of XML "code" was needed in a Java web application I was then writing from the following (addmittedly pretty old) table*** :

     Metric               Java +              Ruby +
Spring/Hibernate Rails


Time to market 4 months, approx. 4 nights,
20 hours/week 5 hours/night
Lines of code 3,293 1,164
Lines of config 1,161 113
Number of classes/ 62/549 55/126
methods

Do you see? The configuration, it's 30% of the code (!!!), and that's 10 times more than was needed in Rails! That's some dependency! By the way, note that Rails needs only 3 times less code that Java, despite the fact that Java libraries can be sometimes pretty low level!

Now, for me the Spring 2 platform was somehow an apogeum of the XML overdependency - the endless configuration files weren't even human readable, you should better use a graphical Eclipse plugin:

to edit the wirings. That's like you need something like UML to tame the complexity of the XML files (and of Spring's configuration of course):


A rhetoric question: isn't it too much of a good thing for something as simple as configuration files? I found the Spring-2 MVC (Spring's web GUI application framework) particularly bad in that respect. Comparing it to the Struts 2 configuration, I had an impression that every obvious fact must be expressed in XML, and that there's no defaults altogether:

<bean name="userListController"
class="de.ib-krajewski.myApp.UserListController">
<property name="userListManager">
<ref bean="userListManager" />
</property>
....
</bean>
<bean id="userListManager"
class="de.ib-krajewski.myApp.UserListManager" />

and so on, and on, and on.

But wait, it comes even better: some people are so enamoured with XML that they are using it as a programming language****, either doing patterns with Spring and XML configuration files, or even inventing an XML-based programming language and writing its interpterer in XSLT (I'm not joking!).

A word of caution: Java community noticed the problem and is trying to reduce the amount of XML needed. Unfortunately, I haven't got any hard numbers, but each new release of a Java framework seems to claim a "reduced XML config file size" - take for example Spring MVC v.3. However, as the DLL hell is still a reality despite of Microsoft's efforts and the introduction of manifests (really, look here!), equally, for my mind the overdependence on XML won't vanish overnight from the Java world - it's at its core!

2. The Analysis

2.1 XML data

Why are we all using XML? The received standard answer is: because it's a portable, human-readable, standardized data format with a very good tool support!

So why does it all feel so wrong? The first reason I see is that XML is too low level for human consumption. It's claiming to be human-readable while it's only human-tolerable - see the Groovy AntBuilder example above. There are tons and tons of verbiage (although that's something a Java programmer will be rather accustomed to ;-)) which is what machines need, but it's not how human mind is working. It needs high level descriptions because it's only all to easy distracted by unneeded details.

When I recall my own config file implementations of yore, I conclude that I never needed more constructs than single config values ans lists of them. And I never needed hierarchical config data for my systems. So if people are using XML for configuration, they are maybe after something more than pure config settings, maybe they are looking for a scripting solution for the application (as mentioned in*) along the lines of the late Tcl? But with Tcl the model was entirely different: the Tcl scipts glueing together passive modules of code, whereas now, the modules are active and use XML as a database of sorts... So is it that at last? People using XML data as a poor man's, read only database? It's a good, ad-hoc solution, and you can emulate hierarchical, network and OO databases without much of a hassle, can't you? So people are using XML as a kind of qiuck and dirty hack, and hacks aren't normally very pretty ;-).

2.2 XML in Java

The second part of the question is: why are we using so much XML in Java? The answer is probably that we need XML config files to increase modularity an require the loose coupling of our systems. The Spring framework allows us to pull the parts of the system together at the startup or run-time, and that's supposedly a good thing, isn't it.

Coming to this part of the question: I think that the loose coupling thing is overrated. As Nicolai Yossutis said in his book about SOA (we reported ;-)):

"...loose coupling has its price, and it's the complexity."

So firstly, a little bit of coupling isn't that bad if it decreases complexity and increases readability of code! To my mind, it's a classical tradeoff, but as some developers are only too prone to introduce tight coupling all over the place, the received wisdom is to avoid the coupling at every price! But be honest, you can't decouple everything, if the parts are ment to be working together! Secondly, I thing that the "inversion of control" pattern is a bit of overkill too.

Personally, I'd like to glue together the parts of the system directly in code, so that I can see how my software is composed, and that in my source code - it should be the sole point of reference! Thus I like much more the approach Guice or Seam are taking (and nowadays the Spring framework itself) - that of annotations. At last we have the relevant information where it belongs - in code! The problem here is, that the annotations aren't really code, they are metadata, and I don't like the idea of using metadata where you could do things directly by using some API. But hey, that's probably the way we are doing such things in Java...

3. A Summary

That's what I wanted to write approximately 2 years ago. Meanwhile I use XML everywhere in my architectures and designs, mainly because it excludes any discussions about formats! No one has ever said anything against XML!

But programming it (in C++ and Qt) is a real pain - I first tried DOM, than SAX - and it was an even greater mess (well, there's a C++ data binding implementation á la Java's Apache XMLBeans but it's not used at my client's :-((). XPath's selectors are supposed to be better, but Qt doesn't have it by now. The good thing is, everyone will think the system is sooo cool because it uses XML, and I have to make my users happy!

PS: And you know what? The new dm application server by SpringSource uses JSON for configuration files!!! I told you so ;-)

--
* "The creator of Ant excercising hit own daemons", this article was originally stored under http://x180.net/Articles/Java/AntAndXML.html, but this link seems to be dead, so I'll host it on my website for a while (hope it's OK...): http://www.ib-krajewski.de/misc/ant-retrospect.html

** as a matter of fact, he's a Ruby programmer too, so he isn't that unpartial: http://www.brainbell.com/tutorials/java/About_Ruby.htm

***taken from the book "Beyond Java" by Bruce Tate: http://commons.oreilly.com/wiki/index.php/Beyond_Java/Ruby_on_Rails, unfortunately the original link stated there isn't working: Justin Gehtland, Weblogs for Relevance, LLC (April 2005); http://www.relevancellc.com/blogs. "I *heart* rails; Some Numbers at Last."

**** "Wiring The Observer Pattern with Spring": http://www.theserverside.com/tt/articles/content/SpringLoadedObserverPattern/article.html, and "XSL Transformations. A delivery medium for executable content over the Internet" in DDJ from April 05, 2007: http://www.ddj.com/architect/198800555 . I cite from the latter:

XIM is an XML-based programming language with imperative control features, such as assignment and loops, and an interpreter written in XSLT. XIM programs can thus be packaged with an XIM processor (as an XSLT stylesheet) and sent to the client for execution.
See, I wasn't joking...

1 comment:

S.P. Sobania said...

Sorry to hear about your XML hassle. Well, it's hard to believe nowadays, but at the beginning there was no XML in Java. And everything worked pretty well. I spotted the first parsers around 1999.

The idea of XML is of course that any intelligent IDE can hide the syntax from the user and provide all of the information in a user friendly way. As for Ant somehow this never did catch on.

I really would like that, if we could express most of the configuration stuff directly in Java and without annotations. I agree.

Originally the serialization mechanism was aimed at situations like that. But Sun started changing the file format from one JDK release to another, as a consequence no one ever used it.