It Will Never Work in Theory

An experiment about static and dynamic type systems

Posted Oct 25, 2012 by Christoph Treude

| Controlled Experiments | Programming Languages |

Stefan Hanenberg. "An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time". OOPSLA 2010.

Although static type systems are an essential part in teaching and research in software engineering and computer science, there is hardly any knowledge about what the impact of static type systems on the development time or the resulting quality for a piece of software is. On the one hand there are authors that state that static type systems decrease an application's complexity and hence its development time (which means that the quality must be improved since developers have more time left in their projects). On the other hand there are authors that argue that static type systems increase development time (and hence decrease the code quality) since they restrict developers to express themselves in a desired way. This paper presents an empirical study with 49 subjects that studies the impact of a static type system for the development of a parser over 27 hours working time. In the experiments the existence of the static type system has neither a positive nor a negative impact on an application's development time (under the conditions of the experiment).

How many experiments in software engineering research are you aware of where the researcher developed a new programming language and corresponding IDE just for the experiment? Well, Stefan Hanenberg did exactly that, and the results are remarkable. The goal of his experiment was to measure the impact of static vs. dynamic type systems on development time and software quality. While there is a lot of conventional wisdom around the use of static or dynamic type systems (e.g., static type systems capture many recurring programming errors and make systems easier to maintain, dynamic type systems make life easier by not posing unnecessary restrictions), there is hardly any hard evidence to support these claims, and for a practitioner, it is unclear which arguments can be trusted.

Unlike what has been done in previous work, Hanenberg decided not to use existing programming languages and IDEs in his experiment because he worried that subjects' familiarity with the tooling would influence the results, in particular if his subjects knew only the dynamic or only the static version used in the study. Therefore, he developed a new object-oriented programming language "Purity" (with some similarities to Smalltalk, Ruby and Java) and a corresponding IDE (class browser, test browser and console). Actually, he developed two versions of Purity: one with static types, the other one with a dynamic type system. The two versions were identical in all other aspects.

His experimental setup followed a between-subject design (i.e., each subject was only used once). He recruited 49 students, divided them into two groups, and taught each group one of the Purity versions (the dynamic type version was taught for 16 hours and the static type version was taught for 18 hours). All subjects were then given exactly 27 hours to implement a scanner and a parser for a given grammar. Hanenberg measured two outcomes: development time and quality. Development time was measured based on log entries and test cases in order to determine the exact point in time when subjects fulfilled all the test cases for a minimal scanner, and the quality of the parser was measured through 400 test cases that represented valid and invalids words in the grammar.

The main result from Hanenberg's study is that -- under the conditions of his experiment -- the existence of a static type system did not have a positive impact on development time or quality. In fact, the subjects who used the dynamic type version of Purity were significantly faster in developing a scanner, but there was no statistically significant difference with respect to quality of the final product.

In addition to conducting and describing a well-planned and well-executed experiment, Hanenberg does a thorough job explaining and justifying his choice of methods, both for data collection and data analysis. But he also discusses the limitations of his work in great depth -- in particular that it is impossible to draw general conclusions from one experiment. However, what a single experiment such as Hanenberg's can do is cast doubts on the role of static type systems in software engineering, and his work opens up lots of venues for future work on which programming languages work better than others, and why.

Comments powered by Disqus