Software Engineering Research Questions

Reviewed by Greg Wilson / 2022-08-30
Keywords: Research Topics

Cross-posted from The Third Bit:

I have been collecting random software engineering research ideas from friends and colleagues for more than a decade. These are the questions I’ve been asked since I started taking notes ten years ago. I apologize for not keeping track of who wanted to know, but if you’re working on any of these, please get in touch and I’ll try to track them down.

Does putting documentation in code (e.g., Python’s docstrings) actually work better than keeping the documentation in separate files, and if so, by what measure(s)?
Do doctest-style tests (i.e., tests embedded directly in the code being tested) have any impact long-term usability or maintainability compared to putting tests in separate files?
Which tasks do developers collaborate on most often and which do they do solo most often? (If I’m reading my handwriting correctly, the questioner hypothesized that programmers routinely do bug triage in groups, but usually write new code alone, with other tasks falling in between.)
Are slideshows written using HTML- or Markdown-based tools more text-intensive than those written in PowerPoint? In particular, are slides written in formats that version control understands (text) less likely to use diagrams than slides written with GUI tools?
A lot of code metrics have been developed over the years; are there any for measuring/ranking the difficulty of getting software installed and configured?
How does the percentage of effort devoted to tooling and deployment change as a project grows and/or ages? And how has it changed as we’ve moved from desktop applications to cloud-based applications? (Note: coming back to full-time coding after a decade away, my impression is that we’ve gone from packaging or building an installer taking 10% of effort to cloud deployment infrastructure being 25-30% of effort, but that’s just one data point.)
Has anyone developed a graphical notation for software development processes like this one for game play?
How do open source projects actually track and manage requirements or user needs? Do they use issues, is it done through discussion threads on email or chat, do people write wiki pages or PEPs, etc.?
Has anyone ever done a quantitative survey of programming books aimed at professionals (i.e., not textbooks) to find out what people in industry care enough to write about or think others care about?
Has anyone ever done a quantitative survey of the data structures used in undergraduate textbooks for courses that aren’t about data structures? I.e., do we know what data structures students are shown in their “other” courses?
Has anyone ever compared a list of things empirical software engineering research has “proven” (ranked by confidence) versus a list of things programmers believe (similarly ranked)?
Has anyone ever done a quantitative survey of how many claims in the top 100 software development books are backed by citations, and of those, how many are still considered valid?
Are there any metrics for code fitness that take process and team into account? (I actually have the source for this one.)
Which of the techniques catalogued in The Discussion Book are programmers familiar with? Which ones do they use informally (i.e., without explicit tool support), and how do they operationalize them?
Is there a graphical notation like UML to show the problems you’re designing around or the special cases you’ve had to take into account rather than the finished solution to the problem (other than complete UML diagrams of the solutions you didn’t implement)?
Ditto for architectural evolution over time: is there an explicit notation for “here’s how the system has changed”, and if so, can it show multiple changes in a single diagram or is it just stepwise?
The Turing Test classifies a machine as “intelligent” if an independent observer can’t distinguish between it and a human being in conversation. Has anyone ever implemented a similar test for malicious software (which we should call the Hoye Test in honor of the person who proposed it, or the Moses Test in “honor” of the person who inspired it):
1. Pick an application (e.g., Twitter).
2. Build a work-alike that is deliberately malicious in some way (e.g., designed to radicalize its users).
3. Have people selected at random use both and then guess which is which.
Has anyone ever summarized the topics covered by ACM Doctoral Dissertation Award winners to see what computer science is actually about? (A subject is defined by what it gives awards for…)
Has anyone ever surveyed developers to find out what the most boring part of their job is?
Is there data anywhere on speakers’ fees at tech conferences broken down by by age, subject, gender, and geography?
Are programmers with greenery or mini-gardens in the office happier and/or more productive than programmers with foosball tables? What about programmers working from home: does the presence of greenery and/or pets make a difference?
How much do software engineering managers know about organizational behavior and/or social psychology? What mistruths and urban myths do they believe?
Has anyone ever compared how long it takes to reach a workable level of understanding of a software system with and without UML diagrams or other graphical notations? More generally, is there any correlation between the amount or quality of different kinds of developer-oriented documentation and time-to-understanding, and if so, which kinds of documentation fare best?
Is it possible to trace the genealogy of the slide decks used in undergrad software engineering classes (i.e., figure out who is adapting lessons originally written by whom)? If so, how does the material change over time?
How do people physically organize coding lessons when using static site generators? For example, do they keep example programs in the same directory or subdirectory as the slides, or keep the slides in one place and the examples in another? And how do they handle incremental evolution of examples, where the first lesson builds a simple version of X, the next lesson changes some parts but leaves others alone, etc.?
Has anyone ever applied security analysis techniques to emerging models of peer review to (for example) anticipate ways in which different kinds of open review might be gamed?
Has anyone ever written a compare-and-contrast feature analysis of tools for building documentation and tutorials? For example, how do Sphinx, Jekyll, and roxygen stack up?
Käfer et al’s paper comparing text and video tutorials for learning new software tools was interesting: has anyone done a follow-up?
Bjarnason et al’s paper on retrospectives was interesting: has anyone looked in more detail at what developers discuss in retrospectives and (crucially) what impact that has?
Has anyone studied adoption over time of changes (read: fixes) to Git’s interface? For example, how widely is git switch actually now being used? And how do adopters find out about it?
Same questions for adoption of new CSS features.
Is ther any correlation between the length of a project’s README file and how widely that software is used? If so, which drives which: does a more detailed README drive adoption or does adoption spur development of a more detailed README?
Do any programming languages use one syntax for assigning an initial value to a variable and another syntax for updating that value, and if so, does distinguishing the two cases help? (Note: I think the person asking this question initially assumed that Python’s new := operator could only be used to assign an initial value.)
How, when, and why do people move from one open source project to another? For example, do they tend to move from a project to one of its dependencies or one of the projects that depends on it? And do they tend to keep the same role in the new project or use the switch as an opportunity to change roles?
How often do developers do performance profiling, what do they measure, and how do they measure it?
Has anyone ever created some like Sajaniemi’s roles of variables for refactoring steps or test cases? (Note: the person asking the question is a self-taught programmer who found Gamma et al’s book a bit intimidating, and is looking for beginner-level patterns.)
Has anyone defined a set of design patterns for the roles that columns play in dataframes during a data analysis?
(How) does team size affect the proportion of time spent on planning and the accuracy of plans?
Is there any way to detect altruism in software teams (i.e., how much time developer A spends helping developer B even though B’s problem isn’t officially A’s concern)? If so, is there any correlation between altruism and (for example) staff turnover or the long-term maintainability of the code base?
Is there any correlation between the quality of the error messages in a software system and the quality of the community? (Note: by “quality of the community”, I believe the questioner meant things like “welcoming to newcomers” and “actually enforces its code of conduct”.)
If you collect data from a dozen projects and guess which ones think they’re doing agile and which aren’t, is there anything more than a weak correlation to what process team members tell you they think they’re following? I.e., are different development methodologies distinct rhetorically but not practically?
What are students taught about debugging after their introductory courses? How much of what they’re explicitly taught is domain-specific (e.g., “how to debug a graphics pipeline”)?
Can we assess students’ proficiency with tools by watching screencasts of their work? And can we do it efficiently enough to make it a feasible way to grade how they code (as well as the code they write)?
A lot of people have built computational notebooks based on text formats (like Markdown) or that run in the browser. Has anyone built a computational notebook starting with Microsoft Word or OpenOffice, i.e., embedded runnable code chunks and their output in a rich document?
When people write essay-length explanations about error handling or database internals, how do they decide what’s worth explaining? Is it “I struggled to figure this out and want to save you the pain” or “I’m trying to build my reputation as an expert in this field” or something else?
Has anyone done a study that plots when people get funded on a loose timeline of “building a startup” broken out by founders’ characteristics? I.e., if 0 is “I have an idea” and 100 is fully functioning company, where do most black/brown founders get funded vs. other poc founders vs. white founders?
Has anyone analyzed videos of coding clubs for children or teens to see if girls are treated differently than boys by instructors and by their peers?
How does the distribution of language constructs actually used in large programs vary by language? For example, if we plot percentage of programs that use feature X in a language, ordered by decreasing frequency, how do the curves for different languages compare?
Is it possible to calculate something like a Gini coefficient to see how effectively scientists use computing? If so, is inequality static, decreasing, or increasing? (Note: the questioner felt strongly that the most proficient scientists are getting better at programming but the vast majority haven’t budged in the last three decades, so the gap between “median” and “best” is actually widening.)
If you train a Markov text generator on your software’s documentation, generate some fake man pages, and give users a mix of real and fake pages, can they tell which are which?
How does the number of (active) Slack channels in an organization grow as a function of time or of the number of employees?
How well are software engineering researchers able to summarize each other’s work based solely on the abstracts of their research papers, and how does that compare to researchers in other domains?
Second-line tech support staff often spend a lot of time explaining how things work in general so that they can solve a specific problem. How do they tell how much detail they need to go into?
Is there a notation like CSS selectors for selecting parts of a program to display in tutorials? (Note: I’ve used several systems that relied on specially-formatted comments to slice sections out of programs for display; the questioner was using one of these for the first time and wondering if there was something simpler, more robust, or more general.)
How does the order in which people write code differ from the order in which they explain code in a tutorial and why?
Has anyone built a computational notebook that presents a two-column display with the code on the left and commentary on the right? If so, how does that change what people do or how they do it?
Is it possible to extract entity-relationship diagrams from programs that use Pandas or the tidyverse to show how dataframes are being combined (e.g., to infer foreign key relationships)?
What percentage of time to developers spend debugging and how does that vary by the kind of code they’re working on?
At what point is it more economical to throw away a module and write a replacement instead of refactoring or extending the module to meet new needs?
Are SQL statements written in execution order easier for novices to understand or less likely to be buggy than ones written in standard order? (Note: the questioner was learning SQL after learning to manipulate dataframes with the tidyverse, and found the out-of-order execution of SQL confusing after the in-order execution of tidyverse pipelines.)
What error recovery techniques are used in what languages and applications how often?
What labels do people define for GitHub issues and pull requests, and do they take those labels with them to new projects or re-think each project?
Has anyone ever taught software engineering ethics by:
1. Creating a set of scenarios, each with multiple-choice options.
2. Having an ethics expert determine the best answer for each.
3. Then have students and professionals answer the same questions.
4. Analyzed the results to see how well each group matches the experts’ opinions and whether practitioners are any better than students.
Has anyone ever studied students from the first year to the final year of their program to see what tools they actually start using when. In particular, when (if ever) do they start to use more advanced features of their IDE (e.g., “rename variable in scope”)?
Underrepresented groups often develop “whisper networks” to share essential knowledge (e.g., a young woman joining a company might be taken aside for an off-the-record chat by an older colleague and cautioned about the behavior of certain senior male colleagues). How have these networks changed during the COVID-19 lockdown?

« Margaret-Anne Storey on Productivity

Do Developers Really Know How to Use Git Commands? »