"Cloning considered harmful" considered harmful

Posted Aug 16, 2011 by Jorge Aranda

Cory J. Kapser and Michael W. Godfrey. "Cloning considered harmful" considered harmful: patterns of cloning in software. Empirical Software Engineering 13, 2008.

Literature on the topic of code cloning often asserts that duplicating code within a software system is a bad practice, that it causes harm to the system's design and should be avoided. However, in our studies, we have found significant evidence that cloning is often used in a variety of ways as a principled engineering tool. For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them. We also examine through a case study the frequencies of these clones in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In this study, we found that as many as 71% of the clones could be considered to have a positive impact on the maintainability of the software system.

Lots of people, both in industry and in academia, would say that copy-pasting code is bad practice: if you find yourself copy-pasting code (or, in academic parlance, creating "code clones"), you should refactor it: abstract the repeated code into its own method and call it from all the original copies. That way, if you need to change it, or if it has some bugs, you only have to fix it in one place. Some researchers have built pretty sophisticated tools that will help you find your code clones, so that you can go and exterminate them wherever they are.

Kapser and Godfrey, however, explore why developers create code clones, and find out (both through argument and through an empirical evaluation) that many code clones are actually OK. In their paper they discuss several kinds of clones---those that are caused, for instance, by platform variations, boiler-plating, or language idioms---and show that often the right approach is to go ahead and copy-paste code. But they note that whether to clone or not is a decision that requires some thinking on a case by case basis:

(...) the results of the case study identify a set of patterns that are most often harmful, namely verbatim snippets and parameterized code. While there were several examples of good usage of these clone patterns, the majority were deemed harmful. This may be an indication that developers should avoid this form of cloning. On the other hand several patterns were found to be mostly good: boiler-plating, replicate and specialize, and cross-cutting concerns. While not always good, when used with care (as with any form of design or implementation decision) these patterns are more likely to achieve an overall beneficial effect on the software system.

In other words: don't demonize code clones, but don't let them slide unquestioningly either.

