It Will Never Work in Theory

Short summaries of recent results in empirical software engineering research

2021-10-18: Bad Practices in Continuous Integration
Keywords: Continuous Integration
Reviewed by: Greg Wilson

The slow but steady normalization of continuous integration (CI) has changed software development just as profoundly as reliance on Q&A sites: by the time I have merged a pull request into the main branch of one of the project's I'm working on, a dozen different checks and actions have run automatically, at least half of which could reject the merge. I believe this automation has made developers more productive, but like any tool it can be used badly, so Zampetti2020 used interviews and mined Stack Overflow posts to find out how. Their complete catalog divides 79 distinct smells into seven...

2021-10-17: Demystifying 'Bad' Error Messages in Data Science Libraries
Keywords: Data Science, Error Messages
Reviewed by: Eddie Antonio Santos

As a computing educator and an occasional software library developer I often feel that a big portion of novice frustration when coding can be resolved by developers writing "better" error messages. Tao2021 provides evidence that dashes my dreams, but also provides actionable suggestions. The authors mined errors that were raised by six popular Python data science libraries: NumPy, Panadas, SciPy, scikit-learn, TensorFlow, and Gensim. With this large list of possible errors, they found possible fixes by doing what most people I know would do: search that error message on StackOverflow. They categorize each error messages as being either clear, uninformative,...

2021-10-16: Open Source Projects in Baidu, Alibaba, and Tencent
Keywords: Diversity, Open Source
Reviewed by: Greg Wilson

You don't realize what barriers people face if you've never had to get over them. I used to tell people that open source was a great leveller because everyone could contribute. I now realize that claim needs to be qualified with, "…provided they're affluent enough to have free time and good internet connectivity, speak English well enough to join a mostly monolingual conversation, and if they're not white and/or not male, willing to put up with a steady drizzle of disparagement or harassment." Han2021 is therefore a very welcome look at some of what's happening outside my bubble. As the...

2021-10-15: Authorship Attribution of Source Code
Keywords: Authorship, Machine Learning
Reviewed by: Greg Wilson

The question, "Who actually wrote this code?" comes up in many contexts, from plagiarism detection in schoolwork to design recovery in legacy systems. Bogomolov2021 presents two machine learning approaches to the problem using neural networks and random forests. Unlike most earlier work, these models operate on paths through the source code's abstract syntax tree (AST). The authors find that: their random forest approach outperforms the previous best result on C++, it matches the best performance of previous systems on Python, and both of their approaches outperform previous results on Java. I have reservations about how eagerly and uncritically some researchers...

2021-10-14: Exploring Programmers' API Learning Processes
Keywords: Cognition
Reviewed by: Greg Wilson

Today was my eighth day in my new job, and I have already had to come to grips with the APIs of half a dozen packages and web services. Figuring out what's available to call and what it will do is central to modern programming, so any research that helps us do it more efficiently is very welcome. Gao2020 is a preliminary observatory study designed to help create a theoretical framework for that task. It draws on cognitive load theory, information foraging theory, and research into external memory (i.e., the ways in which jot things down, draw sketches, and otherwise...

2021-10-13: An Empirical Study of Donations in Open Source
Keywords: Open Source, Sustainability
Reviewed by: Greg Wilson

Even as open source software becomes more widely used, there has been growing concern about its sustainability Eghbal2020. Volunteers can only do so much for so long: thousands of pieces of critical infrastructure only exist because a handful of people are willing to sacrifice their evenings and weekends, and sooner or later, they burn out, become disillusioned, or have to devote their attention to other things. Overney2020 looks at one model for funding their work: donations through platforms like PayPal and Patreon. The authors found: …25,885 projects asking for donations on GitHub…typically with the goal of supporting engineering activities. Many...

2021-10-11: A Critical History of Logo and Constructionist Learning
Keywords: Computers and Society, Computing Education
Reviewed by: Greg Wilson

I grew up reading second-hand copies of books by Asimov, Clarke, and Heinlein, and bought every novel Larry Niven wrote in the 1970s the day it came out in paperback. I daydreamed about a world run by scientists, or at least by people who were as excited by science as I was. It never occurred to me that other people might legitimately be excited about other things: as far as I was concerned, if they didn't think space travel and plate tectonics were the coolest things ever, it was only because they didn't understand them yet. When I started programming...

2021-10-10: Insights from Student Solutions to MongoDB Homework Problems
Keywords: Computing Education, Databases, Novices
Reviewed by: Donny Winston

Alkhabaz2021 tries to determine what concepts are difficult for students when learning MongoDB, and what common errors students make when first learning to query a MongoDB database. I came away not quite understanding the authors' conclusions, but I did encounter two observations that surprised me: 3.5% of student submissions "were omitted from this study, because they had a common error where students copied JavaScript code from online sources, causing the interpreter to fail due to an unexpected unicode character." That seems like a lot to me, and highlights the prevalence of "copy-and-paste" programming even in early computing education. 47% of...

2021-10-08: Do Hackathon Projects Change the World?
Keywords: Hackathons
Reviewed by: Greg Wilson

The short answer to the question posed by McIntosh2021's title is "probably not". They analyzed the GitHub repositories of almost twelve thousand hackathon projects from 2018–19 and found that: …approximately 85% of commits were made within the first month, and approximately 77% of the total commits occurred within the first week. Only 7% of projects had any activity 6 months after the event ended. Evaluated projects had an average of only 3.097 distinct commit dates… That said, this paper doesn't look at the long-term impact of hackathons on the participants themselves. In my experience, many people participate in hackathons to...

2021-10-07: How Do Software Developers Use GitHub Actions?
Keywords: Automation
Reviewed by: Greg Wilson

Today is my fourth day in the first full-time programming job I've had in ten years. Some things have stayed the same (meetings, t-shirts, and package management problems), but others have changed: I'm finally going to have to learn how to use Docker and cloud-based microservices, and where pre- and post-commit hooks were a rarity a decade ago, almost every step of development is now supported by automated checks. Kinsman2021 looks at how teams use GitHub Actions to implement those checks and support development workflows in other ways. They found that the most common operations are (in order) continuous integration,...