A Dataset of Python Code Change Patterns

Reviewed by Greg Wilson / 2023-04-22
Keywords: Dataset

One of the most effective ways to encourage collaboration is to share your code and data, and that's what the authors of this paper have done. They have assembled a dataset of generalizable Python code change patterns, each of which is annotated with a description of what it does, how it is applied, and where it occurred. I can think of half a dozen ways to use this, and I'm sure many readers can think of more. If you'd like to play with the dataset, you can download it from Figshare.

Akalanka Galappaththi and Sarah Nadi. A data set of generalizable python code change patterns. 2023. arXiv:2304.04983.

Mining repetitive code changes from version control history is a common way of discovering unknown change patterns. Such change patterns can be used in code recommender systems or automated program repair techniques. While there are such tools and datasets exist for Java, there is little work on finding and recommending such changes in Python. In this paper, we present a data set of manually vetted generalizable Python repetitive code change patterns. We create a coding guideline to identify generalizable change patterns that can be used in automated tooling. We leverage the mined change patterns from recent work that mines repetitive changes in Python projects and use our coding guideline to manually review the patterns. For each change, we also record a description of the change and why it is applied along with other characteristics such as the number of projects it occurs in. This review process allows us to identify and share 72 Python change patterns that can be used to build and advance Python developer support tools.