Comparing Student and LLM Code Explanations

Reviewed by Greg Wilson / 2023-04-13
Keywords: Computing Education, Machine Learning

Asking whether Large Language Models (LLMs) are going to help or hurt education is about as sensible as asking whether the Internet has been good for society. This paper asks a more specific, and therefore more useful, question: can LLMs produce explanations that will help novice programmers understand code, and are those explanations better, worse, or the same as ones produced by their peers? The answers to both questions appear to be "yes": machine-generated explanations aren't always correct or comprehensible, but neither are those produced by other students.

The authors do touch on some potential concerns, such as ways to prevent students from becoming over-reliant on model-generated explanations, but overall they're very positive about these tools' potential. It will probably be a few years before any of this has significant impact on higher education, but I expect that most online/self-paced learn-to-code offerings are going to have to adapt quickly or find a new business model.

Disclosure: I co-authored a paper in 2019 with one of the authors of this paper (Denny).

Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, and Arto Hellas. Comparing code explanations created by students and large language models. 2023. arXiv:2304.03938.

Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. However, developing the expertise to comprehend and explain code accurately and succinctly is a challenge for many students. Existing pedagogical approaches that scaffold the ability to explain code, such as producing exemplar code explanations on demand, do not currently scale well to large classrooms. The recent emergence of powerful large language models (LLMs) may offer a solution. In this paper, we explore the potential of LLMs in generating explanations that can serve as examples to scaffold students' ability to understand and explain code. To evaluate LLM-created explanations, we compare them with explanations created by students in a large course (n ≈ 1000) with respect to accuracy, understandability and length. We find that LLM-created explanations, which can be produced automatically on demand, are rated as being significantly easier to understand and more accurate summaries of code than student-created explanations. We discuss the significance of this finding, and suggest how such models can be incorporated into introductory programming education.