It Will Never Work in Theory

Short summaries of recent results in empirical software engineering research

2021-09-20: Program comprehension of domain-specific and general-purpose languages
Keywords: Domain-Specific Languages, Program Comprehension
Reviewed by: Greg Wilson

Sometimes software engineering research is like studying supernovas: repeatability isn't feasible (at least, not given current technology), so you have to study the ones you can find. Other times, though, researchers can do controlled experiments and replicate the experiments of others. Kosar2018 is an example of that: in it, the authors replicate an experiment reported in Kosar2011 with several improvements to remove some threats to validity. Their conclusion: developers are signfiicantly better at tool-based program comprehension when using a DSL than when using a general-purpose language. This is an important result, though the adjective "tool-based" does qualify it a bit....

2021-09-20: Do Developers Read Compiler Error Messages?
Keywords: Error Messages
Reviewed by: Greg Wilson

Earlier this month we reviewed Becker2019, which found that most of the error messages produced by compilers aren't particularly helpful. One response was, "Well, but nobody reads them anyway." Barik2017 showed that this isn't true: based on an eye-tracking study of 56 students writing Java with Eclipse, they found that: "participants read error messages, and the difficulty of reading these messages is comparable to the difficulty of reading source code", "difficulty reading error messages significantly predicts participants' task performance" (i.e., the harder the messages are to read, the longer it takes to fix the problem), and "participants allocate a substantial...

2021-09-19: Reading Answers on Stack Overflow: Not Enough!
Keywords: Crowdsourcing, Stack Overflow
Reviewed by: Greg Wilson

I spoke with someone earlier this year who had been using the Unix shell for several years but had never used the man command. Whenever they had a question they went to Stack Overflow: experience had taught them that they could find their answer there more quickly than by hunting through comprehensive breadth-first documentation written by people who are guessing what the reader wants to know rather than responding to the actual gaps in their knowledge. (See this post for more on documentation types and their audiences.) The answers on Stack Overflow are only part of the story, though. As...

2021-09-19: Impact of task switching and work interruptions on software development processes
Keywords: Productivity
Reviewed by: Greg Wilson

Last month's review of Abad2018 proved quite popular, so we decided to look at another study of the effect of interruptions on programmers' productivity. Unsurprisingly, Tregubov2017 found a moderately strong correlation between the number of projects someone is working on and the frequency with which they are interrupted. What was surprising was that working on more projects didn't correlate with cross-project interrupts, so it's possible that organizations that require people to multitask are simply more interrupt-prone. (Note that students are typically required to timeslice five simultaneous courses; given that the frequency of interruptions correlates negatively with productivity, it's not surprising...

2021-09-18: Code of Conduct in Open Source Projects
Keywords: Open Source
Reviewed by: Greg Wilson

It's easy to believe that the world only ever gets worse, but it's not true. A decade ago, most open source projects didn't have codes of conduct, and many others were in the midst of bitter arguments over whether they even should. Today, having a CONDUCT.md file in a project's root directory is as normal as having a license. Enforcement may be spotty, and codes of conduct won't solve tech's systemic failures on their own, but it is progress. Tourani2017 was the first empirical study of Codes of Conduct in open source. Its authors found that eleven codes of conduct...

2021-09-17: Why Do Developers Use Trivial Packages?
Keywords: Packaging
Reviewed by: Greg Wilson

"Reduce, re-use, recycle" is probably the most useful advice someone can give a young programmer, but is it possible to take re-use too far? In particular, is it worth creating and sharing a library that contains only one small function? Conversely, is it sensible to use such micro-libraries? To find out, the authors of Abdalkareem2017 looked at 230,000 NPM packages and 38,000 JavaScript applications. At the time (four years ago) 16.8% of the packages used were trivial, which they defined as having no more than 35 lines of code and a McCabe complexity score no greater than 10. (See this...

2021-09-16: Studying the relationship between exception handling practices and post-release defects
Keywords: Faults, Quality
Reviewed by: Greg Wilson

In this paper from 2018, De Pádua and Shang look at the relationship between anti-patterns in exception handling and error rates in the related code. Unsurprisingly, they find that bad exception handling practices correlate with post-release defects. What makes the work (much) more interesting is that only some anti-patterns correlate in a statistically significant way: The longer the exception handling blocks in a file, the more likely the file is to contain bugs. What's more, the length of the file and the length of its exception handling blocks aren't correlated, so exception handler length really does contain novel information. The...

2021-09-16: Analyzing the effects of test driven development in GitHub
Keywords: Test-Driven Development, Testing
Reviewed by: Greg Wilson

Borle2017 is yet another study showing that test-driven development (TDD) doesn't have any significant impact. The authors looked at Java projects on GitHub for evidence of TDD, then looked for evidence that projects using TDD outperformed projects that didn't. Their conclusions: very few projects actually use it, and those that do don't perform any better than those that don't. This is our sixth post about TDD, and the sixth time the conclusion has been that it doesn't make a difference. The studies we've discussed have used different methods and different datasets, but have all reached the same conclusion, so until...

2021-09-15: Categorizing the Content of GitHub README Files
Keywords: Documentation
Reviewed by: Greg Wilson

Astronomers have to analyze the optical properties of the glass in their telescopes in order to correct for things like chromatic aberration, Equally, software engineering researchers need to study and validate the tools they build to collect and classify data in order to know how reliable those tools are. Prana2018 is a good example of this. Its authors built a classifier to label the sections in the README files found in GitHub repositories as What, Why, How, When, Who, References, Contribution, or Other. They then evaluated the classifier numerically (F-score of 0.746) and by having twenty programmers check whether the...

2021-09-13: What's Wrong With Tech Hiring
Keywords: Hiring
Reviewed by: Greg Wilson

I went through a lot of interviews after being laid off by RStudio earlier this year. Some really impressed me---in particular, I think Automattic's hiring process was excellent---but many were haphazard, confusing, or misguided. I'm not alone in feeling this way: Behroozi2019 found that the questions asked in technical interviews are often seen as irrelevant to actual work, frequently depend on knowledge of trivia, cause a lot of anxiety (which means the results aren't representative of on-job performance), and bias the process in favor of people with lots of free time (i.e., who don't have kids or are affluent enough...