Restarted and Flaky Builds on Travis CI

Reviewed by Greg Wilson / 2021-10-19
Keywords: Continuous Integration

As noted yesterday, the spread of continuous integration (CI) has changed software development just as much as reliance on Q&A sites. The study of failing and restarted CI jobs reported in Durieux2020 gives us yet more insight into how it actually works:

More mature and more complex projects are more likely to include restarted builds.
Builds are mostly restarted because of a failing test, network problem, or Travis CI limitation such as execution timeout.
In over half of the restarted builds, the developers analyze and restart a build within an hour of the initial build execution, which suggests developers interrupt their workflow to wait for CI results.
Restarted builds slow down the merging of pull requests by a factor of three, bringing median merging time from 16 hours to 48 hours.

Durieux2020 Thomas Durieux, Claire Le Goues, Michael Hilton, and Rui Abreu: "Empirical Study of Restarted and Flaky Builds on Travis CI". Proc. International Conference on Mining Software Repositories (MSR), 2020, 10.1145/3379597.3387460.

Continuous Integration (CI) is a development practice where developers frequently integrate code into a common codebase. After the code is integrated, the CI server runs a test suite and other tools to produce a set of reports (e.g., the output of linters and tests). If the result of a CI test run is unexpected, developers have the option to manually restart the build, re-running the same test suite on the same code; this can reveal build flakiness, if the restarted build outcome differs from the original build. In this study, we analyze restarted builds, flaky builds, and their impact on the development workflow. We observe that developers restart at least 1.72% of builds, amounting to 56,522 restarted builds in our Travis CI dataset. We observe that more mature and more complex projects are more likely to include restarted builds. The restarted builds are mostly builds that are initially failing due to a test, network problem, or a Travis CI limitations such as execution timeout. Finally, we observe that restarted builds have an impact on development workflow. Indeed, in 54.42% of the restarted builds, the developers analyze and restart a build within an hour of the initial build execution. This suggests that developers wait for CI results, interrupting their workflow to address the issue. Restarted builds also slow down the merging of pull requests by a factor of three, bringing median merging time from 16h to 48h.

« Bad Practices in Continuous Integration

What's Wrong With my Benchmark Results? »