Automatic Patch Generation Learned from Human-Written Patches
Reviewed by Fayola Peters / 2013-06-06
Keywords: Code Generation, Tools
Kim2013a Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim: "Automatic patch generation learned from human-written patches". 2013 35th International Conference on Software Engineering (ICSE), 10.1109/icse.2013.6606626.
Patch generation is an essential software maintenance task because most software systems inevitably have bugs that need to be fixed. Unfortunately, human resources are often insufficient to fix all reported and known bugs. To address this issue, several automated patch generation techniques have been proposed. In particular, a genetic-programming-based patch generation technique, GenProg, proposed by Weimer et al., has shown promising results. However, these techniques can generate nonsensical patches due to the randomness of their mutation operations.
To address this limitation, we propose a novel patch generation approach, Pattern-based Automatic program Repair (PAR), using fix patterns learned from existing human-written patches. We manually inspected more than 60,000 human-written patches and found there are several common fix patterns. Our approach leverages these fix patterns to generate program patches automatically. We experimentally evaluated PAR on 119 real bugs. In addition, a user study involving 89 students and 164 developers confirmed that patches generated by our approach are more acceptable than those generated by GenProg. PAR successfully generated patches for 27 out of 119 bugs, while GenProg was successful for only 16 bugs.
Software products are released with known and unknown bugs. For example, this paper by Kim et al. mentions that Windows 2000 was released with over 60K known bugs. Why? Bug repairs are for the most part done manually and there are not enough resources to repair all of them. Automatic patch generation methods have been devise to reduce this manual effort.
To add to this field of research, Kim et al. integrate the human element with machine learning. They do this by manually reviewing human written patches to create fix-pattern templates used to then generate "candidate" patches. The purpose in doing this was to solve the issue of what they consider to be "nonsensical patches" produced by GenProg (https://www.cs.virginia.edu/~weimer/p/weimer-icse2012-genprog-preprint.pdf). GenProg's approach to automated patch generation is based on genetic programming used to search the vast space of possible program repairs.
Two interesting aspects of the Kim et al. paper are that the patches generated from one project are used successfully on other projects and the fix-pattern templates once created can be use repeatedly. However, one element not measured in this paper is the monetary cost of fixing each bug (measured in GenProg), considering the bounties offered for bug fixes by some companies, it would be interesting to see such a result included in future automatic patch generation methods.