Learning and Developing with SQL

Reviewed by Greg Wilson / 2023-02-25
Keywords: SQL

What do SE researchers study? One recent finding is that, "…SQL development tasks have a significantly longer time-to-completion than SQL-unrelated tasks and require significantly more code changes." Another is that, "…students are hindered in their query formulation process by mismanaging complexity through writing overly elaborate queries…" Findings like these can and should shape how we plan work, do code reviews, and teach database courses.

Daniel Alencar da Costa, Natalie Grattan, Nigel Stanger, and Sherlock A. Licorish. Studying the characteristics of SQL-related development tasks: an empirical study. 2023. arXiv:2301.10315.

A key function of a software system is its ability to facilitate the manipulation of data, which is often implemented using a flavour of the Structured Query Language (SQL). To develop the data operations of software (i.e, creating, retrieving, updating, and deleting data), developers are required to excel in writing and combining both SQL and application code. The problem is that writing SQL code in itself is already challenging (e.g., SQL anti-patterns are commonplace) and combining SQL with application code (i.e., for SQL development tasks) is even more demanding. Meanwhile, we have little empirical understanding regarding the characteristics of SQL development tasks. Do SQL development tasks typically need more code changes? Do they typically have a longer time-to-completion? Answers to such questions would prepare the community for the potential challenges associated with such tasks. Our results obtained from 20 Apache projects reveal that SQL development tasks have a significantly longer time-to-completion than SQL-unrelated tasks and require significantly more code changes. Through our qualitative analyses, we observe that SQL development tasks require more spread out changes, effort in reviews, and documentation. Our results also corroborate previous research highlighting the prevalence of SQL anti-patterns. The software engineering community should make provision for the peculiarities of SQL coding, in the delivery of safe and secure interactive software.

Daphne Miedema, George Fletcher, and Efthimia Aivaloglou. So many brackets!: an analysis of how SQL learners (mis)manage complexity during query formulation. In Proc. International Conference on Program Comprehension. ACM, May 2022. doi:10.1145/3524610.3529158.

The Structured Query Language (SQL) is a widely taught database query language in computer science, data science, and software engineering programs. While highly expressive, SQL is challenging to learn for novices. Various research has explored the errors and mistakes that SQL users make. Specific attributes of SQL code, such as the number of tables and the degree of nesting, have been found to impact its understandability and maintainability. Furthermore, prior studies have shown that novices have significant issues using SQL correctly, due to factors such as expressive ease, existing knowledge and misconceptions, and the impact of cognitive load. In this paper we identify another factor: self-inflicted query complexity, where users hinder their own problem solving process. We analyse 8K intermediate and final student attempts to six SQL exer-cises, approaching complexity from four perspective: correctness, execution order, edit distance and query intricacy. Through our analyses, we find that our students are hindered in their query formulation process by mismanaging complexity through writing overly elaborate queries containing unnecessary elements, overusing brackets and nesting, and incrementally building queries with persistent errors.