It Will Never Work in Theory: Live!

Our next set of online lightning talks is happening April 25-26, 2023. Check out the speakers and get your ticket now!

What Code is Deliberately Excluded From Test Coverage, and Why?

Reviewed by Greg Wilson / 2021-09-01
Keywords: Testing

I guess I should have seen this result coming. In my last full-time development job, I was responsible for building automated tests of a moderately complex Django application. After eight months of hard work, I had code coverage up over 90%. A year after I left, I discovered that developers had disabled tests one by one as they modified features until coverage was—well, they wouldn't tell me exactly how low it was, but it was low enough that they had stopped bothering to collect or report coverage.

Wind the clock forward ten years to Hora2021b, which looks at how often developers deliberately don't collect coverage stats for their tests and why. Their introduction succinctly summarizes their research questions (RQs) and findings:

  • RQ1: How frequently is code excluded from test coverage? Over one-third of the analyzed projects (20 out of 55) perform deliberate code coverage exclusion. In total, those projects use the exclusion feature in 534 cases.
  • RQ2: When is code excluded from test coverage? Most code is excluded from coverage analysis since its creation (75%), meaning they are already created using the exclusion feature. In 25% of the cases, the exclusion feature is added over time (24 days later, on the median).
  • RQ3: What code is excluded from test coverage? Most of the excluded code happens in conditional statements (42%) and exception handling (29%). Developers tend to exclude non-runnable, debug-only, and defensive code, but also platform-specific and conditional importing.
  • RQ4: Why is code excluded from test coverage? We find that most code is excluded because it is already untested (22%), low-level (20%), or complex (15%). Other rationales are related to deprecation/legacy code, parallelism, trivial/safe code, and non-determinism.

The author then discusses "…the enhancement of coverage tools with mandatory explanations for the exclusion features; the proposal of project guidelines to enforce explanations when using the exclusion feature; the improvement of test coverage tools' documentation with novel exclusion examples; the detection of trivial/safe candidates for coverage exclusion to produce more accurate test coverage reports; and techniques to spot biased coverage reports as well as to detect project-specific test coverage exclusion." While I would have liked to see discussion of guidelines for designing software so that there isn't any reason (or temptation) to leave it out of coverage reporting, this paper is exactly the kind of thorough, detailed examination of a specific practice that I think our field needs.

Hora2021b Andre Hora: "What Code Is Deliberately Excluded from Test Coverage and Why?". 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 10.1109/msr52588.2021.00051.

Test coverage is largely used to assess test effectiveness. In practice, not all code is equally important for coverage analysis, for instance, code that will not be executed during tests is irrelevant and can actually harm the analysis. Some coverage tools provide support for code exclusion from coverage reports, however, we are not yet aware of what code tends to be excluded nor the reasons behind it. This can support the creation of more accurate coverage reports and reveal novel and harmful usage cases. In this paper, we provide the first empirical study to understand code exclusion practices in test coverage. We mine 55 Python projects and assess commit messages and code comments to detect rationales for exclusions. We find that (1) over 1/3 of the projects perform deliberate coverage exclusion; (2) 75% of the code are already created using the exclusion feature, while 25% add it over time; (3) developers exclude non-runnable, debug-only, and defensive code, but also platform-specific and conditional importing; and (4) most code is excluded because it is already untested, low-level, or complex. Finally, we discuss implications to improve coverage analysis and shed light on the existence of biased coverage reports.