Refactoring and Program Comprehension

Reviewed by Greg Wilson / 2023-03-01
Keywords: Program Comprehension, Refactoring

I've spent the last few weeks cleaning up some code at work. Was that a good use of my time? If the finished code is easier for the next person to understand than what I started with, the answer is probably "yes", but that claim needs justification. This paper explores exactly that, and makes seven findings quoted below (with my comments [formatted like this]). As the authors observe in their conclusion, insights like these should drive the design of the next generation of refactoring tools in IDEs.

The isolated application of certain refactorings may not affect the program readability, but other collateral code-changing activities may do. [For example, reordering parameters doesn't affect code readability directly, but can make the comments explaining the code easier to read.]
Re-organizing the source code with the goal of creating more cohesive components, e.g., methods, has a positive impact on the readability, especially in terms of coherence between identifiers and documentation comments.
The refactorings belonging to the category ‘Moving Features between Objects’ determine variations in terms of readability but do not show any clear indication of their positive or negative impact, most likely because the considered metrics cannot properly measure the overall readability of multiple classes together.
Refactorings aiming at re-organizing data within classes do not seem to have a clear impact on the readability. Removing unnecessary class fields has positive effects, but that depends on how the refactorings are applied.
The renaming of code elements is expected to improve readability, as it is done for that specific goal in mind. Yet, this does not always happen in practice, probably because the choice of the new name may affect metrics influenced by the vocabulary of the identifiers.
Moving the logic along the hierarchy structure creates more cohesive classes and shows better readability to some extent (e.g., fewer number of topics). However, the creation of superclasses inevitably has a negative impact as it makes use of more generic and ambiguous terms. [This finding about superclasses surprised me at first, and then didn't when I read the explanation quoted here about generic and ambiguous terms.]
Noticeable changes in readability happen when multiple refactoring operations involving renaming are applied in sequence. For example, applying Rename Attribute refactoring alone may not be enough to improve the readability, so Move Attribute refactoring should be contextually done to provide a more suitable class. This is in line with previous findings that showed how the creation of more cohesive components is desirable for readability.

Giulia Sellitto, Emanuele Iannone, Zadia Codabux, Valentina Lenarduzzi, Andrea De Lucia, Fabio Palomba, and Filomena Ferrucci. Toward understanding the impact of refactoring on program comprehension. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, Mar 2022. doi:10.1109/saner53432.2022.00090.

Software refactoring is the activity associated with developers changing the internal structure of source code without modifying its external behavior. The literature argues that refactoring might have beneficial and harmful implications for software maintainability, primarily when performed without the support of automated tools. This paper continues the narrative on the effects of refactoring by exploring the dimension of program comprehension, namely the property that describes how easy it is for developers to understand source code. We start our investigation by assessing the basic unit of program comprehension, namely program readability. Next, we set up a large-scale empirical investigation—conducted on 156 open-source projects—to quantify the impact of refactoring on program readability. First, we mine refactoring data and, for each commit involving a refactoring, we compute (i) the amount and type(s) of refactoring actions performed and (ii) eight state-of-the-art program comprehension metrics. Afterwards, we build statistical models relating the various refactoring operations to each of the readability metrics considered to quantify the extent to which each refactoring impacts the metrics in either a positive or negative manner. The key results are that refactoring has a notable impact on most of the readability metrics considered.

« Learning to Predict User-Defined Types

The Emotional Roller Coaster of Responding to Requirements Changes »