Python Coding Style Compliance on Stack Overflow

Reviewed by Greg Wilson / 2021-10-01
Keywords: Programming Style

The findings in Bafatakis2019 were a real shock for me. The authors looked at over 400,000 Python code snippets on Stack Overflow that included six statements or more and checked them for coding style compliance. The results?

  • Almost 94% of snippets contained style violations. (Yes, you read that right.) The most common violation was bad whitespace
  • Snippets average 0.7 violations per statement, and many have a much higher ratio.
  • User reputation is unrelated to coding style compliance.

The good news is that for posts with vote scores between -10 and 20 (which accounted for over 99% of posts) there was a very strong negative correlation between the post's score and the average number of violations per statement, meaning that style-compliant posts generally had higher ratings. Given how much programming now consists of "copy, paste, and modify," enforcing style conventions on Stack Overflow might do more to improve general coding standards than anything else we could try.

Bafatakis2019 Nikolaos Bafatakis, Niels Boecker, Wenjie Boon, Martin Cabello Salazar, Jens Krinke, Gazi Oznacar, and Robert White: "Python Coding Style Compliance on Stack Overflow". Proc. International Conference on Mining Software Repositories (MSR), 2019, 10.1109/msr.2019.00042.

Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r=-0.87, p<10-7) between the vote score a post received and the average number of violations per statement for snippets in such posts.