It Will Never Work in Theory

Short summaries of recent results in empirical software engineering research

2021-09-18: Code of Conduct in Open Source Projects
Keywords: Open Source
Reviewed by: Greg Wilson

It's easy to believe that the world only ever gets worse, but it's not true. A decade ago, most open source projects didn't have codes of conduct, and many others were in the midst of bitter arguments over whether they even should. Today, having a CONDUCT.md file in a project's root directory is as normal as having a license. Enforcement may be spotty, and codes of conduct won't solve tech's systemic failures on their own, but it is progress. Tourani2017 was the first empirical study of Codes of Conduct in open source. Its authors found that eleven codes of conduct...

2021-09-17: Why Do Developers Use Trivial Packages?
Keywords: Packaging
Reviewed by: Greg Wilson

"Reduce, re-use, recycle" is probably the most useful advice someone can give a young programmer, but is it possible to take re-use too far? In particular, is it worth creating and sharing a library that contains only one small function? Conversely, is it sensible to use such micro-libraries? To find out, the authors of Abdalkareem2017 looked at 230,000 NPM packages and 38,000 JavaScript applications. At the time (four years ago) 16.8% of the packages used were trivial, which they defined as having no more than 35 lines of code and a McCabe complexity score no greater than 10. (See this...

2021-09-16: Studying the relationship between exception handling practices and post-release defects
Keywords: Faults, Quality
Reviewed by: Greg Wilson

In this paper from 2018, De Pádua and Shang look at the relationship between anti-patterns in exception handling and error rates in the related code. Unsurprisingly, they find that bad exception handling practices correlate with post-release defects. What makes the work (much) more interesting is that only some anti-patterns correlate in a statistically significant way: The longer the exception handling blocks in a file, the more likely the file is to contain bugs. What's more, the length of the file and the length of its exception handling blocks aren't correlated, so exception handler length really does contain novel information. The...

2021-09-16: Analyzing the effects of test driven development in GitHub
Keywords: Test-Driven Development, Testing
Reviewed by: Greg Wilson

Borle2017 is yet another study showing that test-driven development (TDD) doesn't have any significant impact. The authors looked at Java projects on GitHub for evidence of TDD, then looked for evidence that projects using TDD outperformed projects that didn't. Their conclusions: very few projects actually use it, and those that do don't perform any better than those that don't. This is our sixth post about TDD, and the sixth time the conclusion has been that it doesn't make a difference. The studies we've discussed have used different methods and different datasets, but have all reached the same conclusion, so until...

2021-09-15: Categorizing the Content of GitHub README Files
Keywords: Documentation
Reviewed by: Greg Wilson

Astronomers have to analyze the optical properties of the glass in their telescopes in order to correct for things like chromatic aberration, Equally, software engineering researchers need to study and validate the tools they build to collect and classify data in order to know how reliable those tools are. Prana2018 is a good example of this. Its authors built a classifier to label the sections in the README files found in GitHub repositories as What, Why, How, When, Who, References, Contribution, or Other. They then evaluated the classifier numerically (F-score of 0.746) and by having twenty programmers check whether the...

2021-09-13: What's Wrong With Tech Hiring
Keywords: Hiring
Reviewed by: Greg Wilson

I went through a lot of interviews after being laid off by RStudio earlier this year. Some really impressed me---in particular, I think Automattic's hiring process was excellent---but many were haphazard, confusing, or misguided. I'm not alone in feeling this way: Behroozi2019 found that the questions asked in technical interviews are often seen as irrelevant to actual work, frequently depend on knowledge of trivia, cause a lot of anxiety (which means the results aren't representative of on-job performance), and bias the process in favor of people with lots of free time (i.e., who don't have kids or are affluent enough...

2021-09-12: Developer Testing in the IDE: Patterns, Beliefs, and Behavior
Keywords: Test-Driven Development, Testing
Reviewed by: Greg Wilson

Back in 2016 we reviewed Beller2015, which looked at when, how, and why developers (don't) test in their IDEs. Beller2019, published four years later, is a deeper and more detailed look at how much testing programmers actually do, and it isn't a pretty picture. After studying almost 2500 software engineers using Java and C# in four different IDEs for two and a half years, the authors found that (among other things): half of developers do not test at all; most programming sessions end without any test execution; 12% of tests show flaky behavior; test-driven development (TDD) is not widely practiced;...

2021-09-11: Common Bug-Fix Patterns: A Large-Scale Observational Study
Keywords: Faults
Reviewed by: Greg Wilson

We know a lot more about the mistakes programmers make and how often they make them than most programmers realize. Campos2017 is an example: its authors analyzed a dataset containing over 4 million bug-fix commits from over 100,000 Java projects and checked their findings against a qualitative analysis of manually curated bugs in a smaller dataset. Using the taxonomy developed in Pan2008, Campos2017 found that the five most common fixes were: fixing the conditional in an if (4.2% of fixes); fixing the value(s) passed in a method call (6.3%); fixing the number or type of value(s) passed in a method...

2021-09-10: Why Software Projects need Heroes: Lessons Learned from 1100+ Projects
Keywords: Code Ownership, Software Projects
Reviewed by: Greg Wilson

Forty-five years ago, Fred Brooks advocated a "chief programmer" model of development similar to the "chief surgeon" model used in most hospitals. It didn't catch on—at least not formally—but as Majumder2019 shows, many projects use it in practice. Their research questions and answers are: How common are hero projects? They define a "hero project" as one in which 5% or fewer of developers are responsible for 95% or more of interactions. They don't just consider commits: looking at both the code interaction graph and the social interaction graph, they find that 80% of the projects they looked at on GitHub...

2021-09-09: Organizing for openness: six models for developer involvement in hybrid OSS projects
Keywords: Governance, Open Source
Reviewed by: Greg Wilson

A statistical study compares a few properties of a large number of subjects numerically. A case study, on the other hand, applies multiple methods to a single subject to "triangulate" on the truth, i.e., to see whether those different perspectives yield similar conclusions. The former are more popular, in part because they require coding rather than close reading, but many of the insights I find most useful come from the latter. Maenpaa2018 is a great example. It examines the organization and governance of six commercially-influenced open source projects: Eclipse, Qt, Sailfish, NetBeans, GTK+, and Vaadin. Their two main questions are...