Do Developers Really Know How to Use Git Commands?

Reviewed by Kolja Pluemer / 2022-09-19
Keywords: Stack Overflow, Tools, Usability

Git, aptly described by the paper as a cross-platform and open-source distributed version control tool, is as popular as it is confusing and misunderstood. At least, that is an often expressed sentiment in the software development community. Now, the authors attempt to move this lament from a meme towards a quantitatively established web of validated effects. And they do a good job! In a way, this study may be seen as a complement to more qualitative approaches to the question of Git's usability.

Two figures in front of a laptop having the following conversation: 'This is Git. It tracks collaborative work on projects through a beautiful distributed graph theory tree model.' - 'Cool. How do we use it?' - 'No idea. Just memorize these shell commands and type them to sync up. If you get errors, save your work elsewhere, delete the project, and download a fresh copy.'
Comic expressing the cynical attitude many developers employ towards Git.

Yang, Zhang, Pan, Xu, Zhou and Huang use the popular developer Q&A platform Stack Overflow as well as a survey to attain large quantities of data. Armed with that, they express the following five aims:

  1. Establish how many questions (on Stack Overflow) relate to Git commands and how this changes over the years.
  2. Find out who is asking these questions: novices or seasoned devs?
  3. List which Git commands elicit the most questions.
  4. Furthermore, find out for which commands it is hardest to get questions answered about.
  5. Get an idea how devs approach learning Git and how they are doing.

To these ends, the researches use a fairly complex and well-described pipeline of statistical tools. As far as I am able to tell they follow best practices, including the discussion of limitations, potential pitfalls and irregularities in the data. Thus, the paper may serve as a positive example of eliciting answers from large data sets.

Let us then take a peek at the findings in regards to the five goals described above:

  1. Over 80,000 questions relate to Git commands at the time of writing. That's a lot! The percentage of Git-related questions on SO remains fairly stable over time, with their view count being far above that of the average question.
  2. Over 40% of people asking questions about Git have been on SO for over four years, suggesting this is not what we could call a "noob problem".
  3. The most viewed questions relate to git revert and git reflog, with stash, clean and reset being close contenders. Relatable, I would say.
  4. Threads with no Accepted Answer are unsurprisingly dominated by rarely used commands such as pack-redundant. When filtering for more commonly commands, threads that discuss complex scenarios relating to commands like git submodule turn out to be the least answered.
  5. Supported by a survey, the authors find that almost all devs use a self-learning approach to Git. They often stress the importance of learning Git properly, while describing themselves mostly as 'advanced beginner' or 'competent'.

With these findings in mind, the authors give some recommendations to researchers, educators and devs: Firstly, using SO public data is confirmed to be a potent data source for such studies. They both encourage others to use the very data they found as well as building upon the research. Findings such as the list of most 'difficult' commands may be used to design and adapt teaching programs, whether these are university courses or self-learning curricula. I definitely agree that there is a dire need for more and better targeted Git courses - the fact that most developers teach themselves and the fact that understanding of Git is widely elusive shows that there is indeed room for improvement.

Overall, I can only repeat that this paper does a very thorough job at collecting relevant data. These findings can be readily used to aid the improvement of your own development skills, the design of learning material or even Git itself. I can only hope more studies which also formulate solutions will follow.

Yang2022 Yang, W., Zhang, C., Pan, M., Xu, C., Zhou, Y., and Huang, Z.: "Do Developers Really Know How to Use Git Commands? A Large-Scale Study Using Stack Overflow" ACM Transactions on Software Engineering and Methodology (TOSEM) 31.3 (2022): 1-29. 10.1145/3494518

Git, a cross-platform and open-source distributed version control tool, provides strong support for non-linear development and is capable of handling everything from small to large projects with speed and efficiency. It has become an indispensable tool for millions of software developers and is the de facto standard of version control in software development nowadays. However, despite its widespread use, developers still frequently face difficulties when using various Git commands to manage projects and collaborate. To better help developers use Git, it is necessary to understand the issues and difficulties that they may encounter when using Git. Unfortunately, this problem has not yet been comprehensively studied. To fill this knowledge gap, in this paper, we conduct a large-scale study on Stack Overflow, a popular Q&A forum for developers. We extracted and analyzed 80,370 relevant questions from Stack Overflow, and reported the increasing popularity of the Git command questions. By analyzing the questions, we identified the Git commands that are frequently asked and those that are associated with difficult questions on Stack Overflow to help understand the difficulties developers may encounter when using Git commands. In addition, we conducted a survey to understand how developers learn Git commands in practice, showing that self-learning is the primary learning approach. These findings provide a range of actionable implications for researchers, educators, and developers.