On Software and Languages: Empirical Results on the Static vs. Dynamic Debate?

You may know that I'm quite interested in the "Static vs. Dynamic Typing" controversy, and already even clearly declared my allegiance basing my decision on some pretty sloppy reasoning (if not largely on the gut feeling...).

But what about some hard facts, numbers, measurements and controlled experiments to decide the debate? Like the Leibnizian* "Calculemus!". Well, some good soul (@danluu) took the burden of this task and looked** at the best known papers, talks and comparison studies. To what avail?

First there's a warning:

"If you look at the internet commentary on these studies, most of them are passed around to justify one viewpoint or another. The Prechelt study on dynamic vs. static, along with the follow-ups on Lisp are perennial favorites of dynamic language advocates, and github mining study has recently become trendy among functional programmers."

So apparently, there's little interest to find out the truth, only to defend convenient positions. But we are fearless seekers for the truth, and want to know what the results are. Unfortunately the classic studies, when examined in more detail reveal some serious methodological flaws, and the author could only conclude that:

"Other than cherry picking studies to confirm a long-held position, the most common response I’ve heard to these sorts of studies is that the effect isn’t quantifiable by a controlled experiment."

OK, pretty sobering, we still know nothing, it's all lies, damn lies and/or lacking statistical knowledge. But how come the theoretically superior static typing, and that by a simple argument - how can be more support from the compiler not be better than less support - do not bring any measurable improvements?

Lets reiterate the position taken by supporters of dynamic typing in words taken from a talk at JavaZone:

If these academic computer scientists would get out more, they would soon discover an increasing incidence of software developed in languages such a Python, Ruby and Clojure which use dynamic, albeit strong, type systems. They would probably be surprised to find that much of this software—in spite of their well-founded type-theoretic hubris—actually works, and is indeed reliable out of all proportion to their expectations.

So it seems to work in real life, and the experimental studies cannot prove the contrary. But it also looks like someone is trying hard to ignore something, don't you think so?

But then I discovered a recent paper which proposed an new angle of attack!

While the old studies tried to measure the performance of programmers (and somehow failed to design a meaningful experiment) the authors of the new one pursued a more modest goal. They introduced random changes to program's code and then tried to compile and run it. In their words:

"Our study is based on a diverse corpus of programs written in several programming languages systematically perturbed using a mutation-based fuzz generator.
....
More importantly, our study also demonstrates the potential of comparative language fuzz testing for evaluating programming language designs."

We thus could have a possibility to compare programming languages in some way? Sounds promising! So what are the results?

"The results we obtained prove that languages with weak type systems are significantly likelier than languages that enforce strong typing to let fuzzed programs compile and run, and, in the end, produce erroneous results."

The exact results are maybe interesting***:

Ruby:
   - compile: 54%
   - run       : 31%
Python:
   - compile: 43%
   - run       : 25%
PHP: (that poster child of bad language design!)
   - compile: 48%
   - run       : 41%
Javascript:
   - compile: 49%
   - run       : 23%
Java:
   - compile: 18%
   - run       : 15%
Haskell:
   - compile: 19,5%
   - run       : 18%
C++:
   - compile: 20,5%
   - run       : 12%

See, On the average, dynamically typed programs are twice as likely to be compiling/running wioth nonsensical source code. That's definitely SOME improvement, isn't it?

On the other side: they are testing programming languages' resiliency against typos - but wasn't that the exact reason for introduction of types in programming languages? So we know that typed languages are good in what they were invented for. But we knew that already.

Update: This blogpost sets up a different thesis: the difference between statically and dynamically typed languages isn't that important. Rather than that, the simplicity of the language design is decisive in order to avoid bugs! IMHO this argument is misguided, because what I wanted to know is, given two languages of "same complexity" (problems, problems, hot to measure that?) which is a better one - the one with or without static typing?

--
* Nice little fact - Leibniz's program for mathematization of human knowledge was formulated in the following words:

"Actually, when controversies arise, the necessity of disputation between two philosophers would not be bigger than that between two computists. It would be enough for them to take the quills in their hands, to sit down at their abaci, and to say (as if inviting each other in a friendly manner): Let's calculate! (Calculemus!)"

Unfortunately, this cannot be done (at least according to Turing) and flame wars still prevail.

** http://danluu.com/empirical-pl/ - "Static vs. Dynamic Languages: A Literature Review"

*** Caution: they are not exact, I read it out of a diagram!

On Software and Languages

Tuesday, 29 March 2016

Empirical Results on the Static vs. Dynamic Debate?

No comments:

Post a Comment