Characterizing Single-Statement Bugs in Popular Open-Source Python Projects
Reviewed by Greg Wilson / 2022-03-11
Keywords: Bugs, Open Source, Python
I spent most of an afternoon last week tracking down a bug caused by having two decorators stacked on a function in the wrong order. The fix was small, but the impact was not. Reading this paper has got me wondering how often this happens—how often it turns out that just one line in a program needs to change to make it right.
Kamienski and colleagues set out to answer two questions: what are the most common single-statement bugs in Python projects, and how do they differ from those in Java projects? After harvesting code from World of Code, they used diffs to identify single-statement fixes. The findings and differences are both interesting:
Pattern name | Python | % | Java | % |
---|---|---|---|---|
Same Function More Args | 9,958 | 14 | 5,100 | 8 |
Wrong Function/Method Name | 9,901 | 12 | 10,179 | 16 |
Change Identifier Used | 8,973 | 12 | 22,668 | 35 |
Add Function Around Expression | 6,363 | 9 | 0 | 0 |
Change Attribute Used | 5,229 | 7 | 0 | 0 |
Change Numeric Literal | 4,775 | 7 | 5,447 | 8 |
Change Operand | 4,657 | 6 | 807 | 1 |
Same Function Less Args | 3,381 | 5 | 1,588 | 2 |
Add Method Call | 3,338 | 5 | 0 | 0 |
Add Elements to Iterable | 2,541 | 3 | 0 | 0 |
More Specific If | 2,443 | 3 | 2,381 | 4 |
Change Constant Type | 2,199 | 3 | 0 | 0 |
Change Unary Operator | 2,187 | 3 | 1,016 | 2 |
Change Keyword Argument Used | 1,554 | 2 | 0 | 0 |
Change Boolean Literal | 1,466 | 2 | 1,842 | 3 |
Add Attribute Access | 1,439 | 2 | 0 | 0 |
Same Function Wrong Caller | 1,163 | 2 | 1,504 | 2 |
Change Binary Operator | 976 | 1 | 2,241 | 5 |
Less Specific If | 943 | 1 | 2,813 | 4 |
Same Function Swap Args | 336 | >1 | 612 | 1 |
Change Modifier | 0 | 0 | 5,011 | 8 |
Delete Throws Exception | 0 | 0 | 508 | 1 |
Missing Throws Exception | 0 | 0 | 206 | >1 |
Viewed graphically:
The fact that some errors only make sense in one language or the other, and that new categories were added in the Python analysis, both weaken the correlation, but even given that, there's very little correspondence. What is perhaps more interesting is the way that a handful of errors dominate each language: this suggests that tools could focus on them, or that future languages could be designed to make these errors impossible (or at least less likely).
Kamienski2021 Arthur V. Kamienski, Luisa Palechor, Cor-Paul Bezemer, and Abram Hindle: PySStuBs: characterizing single-statement bugs in popular open-source Python projects. In Proc. MSR 2021, doi:10.1109/msr52588.2021.00066.
Single-statement bugs (SStuBs) can have a severe impact on developer productivity. Despite usually being simple and not offering much of a challenge to fix, these bugs may still disturb a developer's workflow and waste precious development time. However, few studies have paid attention to these simple bugs, focusing instead on bugs of any size and complexity. In this study, we explore the occurrence of SStuBs in some of the most popular open-source Python projects on GitHub, while also characterizing their patterns and distribution. We further compare these bugs to SStuBs found in a previous study on Java Maven projects. We find that these Python projects have different SStuB patterns than the ones in Java Maven projects and identify 7 new SStuB patterns. Our results may help uncover the importance of understanding these bugs for the Python programming language, and how developers can handle them more effectively.