How Effective is Continuous Integration in Indicating Single-Statement Bugs?

Reviewed by Greg Wilson / 2022-04-08
Keywords: Continuous Integration, Software Quality

Single-statement bugs (SStuBs) are to research on bugs and bug fixes what E. coli is to genetics: small enough that they can actually be understood but still complex enough to shed light on larger problems. Latendresse2021 looked closely at commits that introduced 318 SStuBs in 14 open-source Java projects and found that, "Of the 240 SStuB-related builds [by the continuous integration system], only 7.5% (18) fail just before the fixing commit." In other words, "…the majority of the builds do not fail when SStuBs were introduced to the code base, neither at the builds preceding the fix commit. In fact, most of the studied SStuBs stayed hidden in the code for more than a month, with 22% of them staying in the code for at least 6 months." What's even more depressing, "From the 23 failed builds, none are caused by a test affected by SStuBs. Most of the failed builds (14 out of 23) are caused by external failures, such as dependency errors, and did not even execute the test suite."

These findings have me wondering whether CI systems aren't finding small bugs because developers fix them before making their work public. I run tests in an isolated container using Earthly before committing or pushing code; I'd be curious to know if DI (desktop integration) is catching things before CI gets a chance.

Latendresse2021 Jasmine Latendresse, Rabe Abdalkareem, Diego Elias Costa, and Emad Shihab. How effective is continuous integration in indicating single-statement bugs? In Proc. MSR 2021, doi:10.1109/msr52588.2021.00062.

Continuous Integration (CI) is the process of automatically compiling, building, and testing code changes in the hope of catching bugs as they are introduced into the code base. With bug fixing being a core and increasingly costly task in software development, the community has adopted CI to mitigate this issue and improve the quality of their software products. Bug fixing is a core task in software development and becomes increasingly costly over time. However, little is known about how effective CI is at detecting simple, single-statement bugs.

In this paper, we analyze the effectiveness of CI in 14 popular open source Java-based projects to warn about 318 single-statement bugs (SStuBs). We analyze the build status at the commits that introduce SStuBs and before the SStuBs were fixed. We then investigate how often CI indicates the presence of these bugs, through test failure. Our results show that only 2% of the commits that introduced SStuBs have builds with failed tests and 7.5% of builds before the fix reported test failures. Upon close manual inspection, we found that none of the failed builds actually captured SStuBs, indicating that CI is not the right medium to capture the SStuBs we studied. Our results suggest that developers should not rely on CI to catch SStuBs or increase their CI pipeline coverage to detect single-statement bugs.