Errors in SQL Queries

Reviewed by Greg Wilson / 2021-08-27
Keywords: Programming Languages, SQL

The abstract to Taipalus2018 starts by saying, "SQL is taught in almost all university level database courses, yet SQL has received relatively little attention in educational research." To which I can only add a shamefaced, "Guilty." I had been programming professional for 20 years before I wrote my first SQL query, and while I included a little bit of SQL in a couple of books I worked on, it was another decade before I took it seriously. I was guilty of the same kind of snobbery as people who look down on spreadsheets, CSS, and other tools that are "merely useful", which is why I'm very glad to see papers like theirs and Miedema2021.

The authors of these papers both examined the mistakes in students' queries in order to classify them and (in Miedema2021) dig into the misconceptions that led to the errors. Some of the problems are syntactic; others have to do with things like using AND instead of OR, trying to join tables on columns that aren't related, or attempts to write a shorter or simpler query than the problem needed. I wasn't surprised by any of the categories, but I doubt anyone would have been able to create this list from first principles, and I believe any made-up list would also have included problems that didn't actually show up.

But since we're stuck with SQL as it is, what's the point of knowing what its problems are? My first answer is that it can help us write better error messages; my second is that it'll help us create better lessons, and the third is that when we build other interfaces for manipulating tabular data, we can try to make these errors less possible. I hope that everyone writing object-relational mappers and user-facing data query systems will take these findings to heart.

Taipalus2018 Toni Taipalus, Mikko Siponen, and Tero Vartiainen: "Errors and Complications in SQL Query Formulation". ACM Transactions on Computing Education, 18(3), 2018, 10.1145/3231712.

SQL is taught in almost all university level database courses, yet SQL has received relatively little attention in educational research. In this study, we present a database management system independent categorization of SQL query errors that students make in an introductory database course. We base the categorization on previous literature, present a class of logical errors that has not been studied in detail, and review and complement these findings by analyzing over 33,000 SQL queries submitted by students. Our analysis verifies error findings presented in previous literature and reveals new types of errors, namely logical errors recurring in similar manners among different students. We present a listing of fundamental SQL query concepts we have identified and based our exercises on, a categorization of different errors and complications, and an operational model for designing SQL exercises.

Miedema2021 Daphne Miedema, Efthimia Aivaloglou, and George Fletcher: "Identifying SQL Misconceptions of Novices: Findings from a Think-Aloud Study". Proceedings of the 17th ACM Conference on International Computing Education Research, 10.1145/3446871.3469759.

SQL is the most commonly taught database query language. While previous research has investigated the errors made by novices during SQL query formulation, the underlying causes for these errors have remained unexplored. Understanding the basic misconceptions held by novices which lead to these errors would help improve how we teach query languages to our students. In this paper we aim to identify the misconceptions that might be the causes of documented SQL errors that novices make. To this end, we conducted a qualitative think-aloud study to gather information on the thinking process of university students while solving query formulation problems. With the queries in hand, we analyzed the underlying causes for the errors made by our participants. In this paper we present the identified SQL misconceptions organized into four top-level categories: misconceptions based in previous course knowledge, generalization-based misconceptions, language-based misconceptions, and misconceptions due to an incomplete or incorrect mental model. A deep exploration of misconceptions can uncover gaps in instruction. By drawing attention to these, we aim to improve SQL education.