It Will Never Work in Theory

Three Empirical Studies From ESEC/FSE'11

Posted Oct 22, 2011 by Greg Wilson

| Quality | Quantitative Studies | Tools |

As our previous post said, a lot of interesting work was presented at the joint ECSE/FSE conference in September. Three of my favorites reporting empirical studies are:

  1. Sven Appel, Jörg Liebeg, and Christian Kästner: "Semistructured Merge: Rethinking Merge in Revision Control Systems".
    An ongoing problem in revision control systems is how to resolve conflicts in a merge of independently developed revisions. Unstructured revision control systems are purely text-based and solve conflicts based on textual similarity. Structured revision control systems are tailored to specific languages and use language-specific knowledge for conflict resolution. We propose semistructured revision control systems that inherit the strengths of both: the generality of unstructured systems and the expressiveness of structured systems. The idea is to provide structural information of the underlying software artifacts — declaratively, in the form of annotated grammars. This way, a wide variety of languages can be supported and the information provided can assist in the automatic resolution of two classes of conflicts: ordering conflicts and semantic conflicts. The former can be resolved independently of the language and the latter using specific conflict handlers. We have been developing a tool that supports semistructured merge and conducted an empirical study on 24 software projects developed in Java, C#, and Python comprising 180 merge scenarios. We found that semistructured merge reduces the number of conflicts in 60% of the sample merge scenarios by, on average, 34%, compared to unstructured merge. We found also that renaming is challenging in that it can increase the number of conflicts during semistructured merge, and that a combination of unstructured and semistructured merge is a pragmatic way to go.
    Almost all version control systems treat files as lines of text, ignoring whatever program structure they contain. The few that diff and merge at the logical level only work that way, and are usually only available as part of all-or-nothing programming environments. In this paper, the authors look at a hybrid approach that tries to combine the good features of both pure alternatives. The tool itself is interesting, but I was equally interested in the empirical study they did to see how much of a difference they were making. That study told them that when their tool underperformed, it was most often because it couldn't handle renamings well, which in turn tells them what they need to work on next.
  2. Andrew Meneely, Pete Rotella, and Laurie Williams: "Does Adding Manpower Also Affect Quality? An Empirical Longitudinal Analysis".
    With each new developer to a software development team comes a greater challenge to manage the communication, coordination, and knowledge transfer amongst teammates. Fred Brooks discusses this challenge in The Mythical Man-Month by arguing that rapid team expansion can lead to a complex team organization structure. While Brooks focuses on productivity loss as the negative outcome, poor product quality is also a substantial concern. But if team expansion is unavoidable, can any quality impacts be mitigated? Our objective is to guide software engineering managers by empirically analyzing the effects of team size, expansion, and structure on product quality. We performed an empirical, longitudinal case study of a large Cisco networking product over a five year history. Over that time, the team underwent periods of no expansion, steady expansion, and accelerated expansion. Using team-level metrics, we quantified characteristics of team expansion, including team size, expansion rate, expansion acceleration, and modularity with respect to department designations. We examined statistical correlations between our monthly team-level metrics and monthly productlevel metrics. Our results indicate that increased team size and linear growth are correlated with later periods of better product quality. However, periods of accelerated team expansion are correlated with later periods of reduced software quality. Furthermore, our linear regression prediction model based on team metrics was able to predict the product's post-release failure rate within a 95% prediction interval for 38 out of 40 months. Our analysis provides insight for project managers into how the expansion of development teams can impact product quality.
    The Mythical Man-Month is the most-quoted book in software engineering. Here, the authors test its central claim by looking at what effect expanding a development team has on downstream fault rates; in particular, they look at how the rate of team expansion correlates with defects later on. Their finding is that growth on its own doesn't hurt quality: it's rapid growth that causes problems.
  3. Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram: "How Do Fixes Become Bugs?"
    This paper presents a comprehensive characteristic study on incorrect bug-fixes from large operating system code bases including Linux, OpenSolaris, FreeBSD and also a mature commercial OS developed and evolved over the last 12 years, investigating not only the mistake patterns during bug-fixing but also the possible human reasons in the development process when these incorrect bug-fixes were introduced. Our major findings include: (1) at least 14.8%-24.4% of sampled fixes for post-release bugs 1 in these large OSes are incorrect and have made impacts to end users. (2) Among several common bug types, concurrency bugs are the most difficult to fix correctly: 39% of concurrency bug fixes are incorrect. (3) Developers and reviewers for incorrect fixes usually do not have enough knowledge about the involved code. For example, 27% of the incorrect fixes are made by developers who have never touched the source code files associated with the fix. Our results provide useful guidelines to design new tools and also to improve the development process. Based on our findings, the commercial software vendor whose OS code we evaluated is building a tool to improve the bug fixing and code reviewing process.
    This paper's starting point is something every seasoned developer knows: bug fixes are often buggy themselves. But how buggy? And are fixes for some kinds of bugs more error-prone than others? This papers examines 12 years of data from four operating systems to produce the statistics and recommendations summarized in the abstract. (Not surprisingly, concurrency and memory-management bugs are the hardest ones to fix correctly.) Given that testing and code review resources are always in short supply, this kind of information can help teams focus their efforts where they'll do the most good.
Comments powered by Disqus