Experimental Assessment of Software Metrics Using Automated Refactoring

Reviewed by Felienne Hermans / 2013-02-12
Keywords: Metrics, Refactoring

Cinneide2012 Mel Ó Cinnéide, Laurence Tratt, Mark Harman, Steve Counsell, and Iman Hemati Moghadam: "Experimental assessment of software metrics using automated refactoring". Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM '12, 10.1145/2372251.2372260.

The impact and applicability of software metrics continues to be a subject of debate, especially since there are many metrics that measure similar properties, like cohesion. This raises the question of the extent to which these metrics agree or not.

The interesting idea that this paper proposes is to not only analyze the agreement and disagreement of metrics, but to also investigate how the metrics change on refactored versions of the same code. The authors do so by randomly applying automated refactorings to a code base and observing how these refactorings impact the metrics. By running these automated refactoring analysis, the authors want to distinguish between what they call volatile metrics, those that are easily impacted, and inert metrics that hardly change under refactoring. Furthermore, they want to know what metrics change in relation with one another, are the refactorings that cause one metric to increase, while another (supposedly measuring a similar property) decreases.

They applied their method to 300KLOC of Java code of 8 open source systems and investigated the following five metrics:

Tight Class Cohesion(TCC)
Lack of Cohesion between Methods (LCOM5)
Class Cohesion (CC)
Sensitive Class Cohesion (SCOM)
Low-level Similarity Base Class Cohesion. (LSCC)

Their evaluation shows that LSCC, CC and LCOM5 are all highly volatile metrics: in 99% of the refactorings, these were either increased or decreased. The results, however, were different for the 8 systems under consideration. In one case, for example, all metrics turned out to be volatile. Even when normalizing for relative volatility, the variance remained high.

In a second evaluation, the relationship between two of the cohesion metrics, LSCC and TCC, is explored in more detail. Refatorings where one of those two metrics is lowered, while the other is increased are studied in more detail.

What makes this work so interesting, apart from the cool originality of applying automated refactoring in the context of metrics, is the fact that it changes our perception of metrics. Where we previously assumed that different metrics for cohesion were mainly a matter of taste (and hence debate), this papers finds that metrics can not only differ, but that they can be conflicting in many cases.

« Why We Need Evidence

Halving Fail Rates using Peer Instruction »