Extracting Rationale for Open Source Development Decisions

Reviewed by Greg Wilson / 2022-04-25
Keywords: Open Source

I've never tried to measure how much of my time I spend staring at a screen full of code and wondering, "Why the hell is it doing that?" Even small teams with little turnover can struggle to keep track of the reasons for particular decisions, and newcomers frequently have to recapitulate their predecessors' journeys of discovery in order to figure out why alpha has to be re-initialized at that particular moment.

Sharma2021 explores how well decision rationales can be extracted after the fact from Python's email archives. The authors combine term patterns such as Proposal-State-Reason, proximity-based heuristics, role-based heuristics, and the presence of special terms like "BDFL pronouncement" to identify promising messages, then scored how well parameterized combinations of those heuristics did compared to a manually-curated set of messages. They found that, "If we consider the top 10 ranked results, [aggregating at the message level] captures 74% and 86% of rationale sentences, respectively; and top 15 increases this further to 82% and 91%, respectively." That's much better than I would have expected, and I hope the authors will make their tool available as a specialized search engine for practitioners to try out.

Sharma2021 Pankajeshwara Nand Sharma, Bastin Tony Roy Savarimuthu, and Nigel Stanger. Extracting rationale for open source software development decisions—a study of Python email archives. Proc. ICSE 2021, doi:10.1109/icse43902.2021.00095.

A sound Decision-Making (DM) process is key to the successful governance of software projects. In many Open Source Software Development (OSSD) communities, DM processes lie buried amongst vast amounts of publicly available data. Hidden within this data lie the rationale for decisions that led to the evolution and maintenance of software products. While there have been some efforts to extract DM processes from publicly available data, the rationale behind 'how' the decisions are made have seldom been explored. Extracting the rationale for these decisions can facilitate transparency (by making them known), and also promote accountability on the part of decision-makers. This work bridges this gap by means of a large-scale study that unearths the rationale behind decisions from Python development email archives comprising about 1.5 million emails. This paper makes two main contributions. First, it makes a knowledge contribution by unearthing and presenting the rationale behind decisions made. Second, it makes a methodological contribution by presenting a heuristics-based rationale extraction system called Rationale Miner that employs multiple heuristics, and follows a data-driven, bottom-up approach to infer the rationale behind specific decisions (e.g., whether a new module is implemented based on core developer consensus or benevolent dictator's pronouncement). Our approach can be applied to extract rationale in other OSSD communities that have similar governance structures.