An Empirical Study of Obsolete Answers on Stack Overflow

Reviewed by Greg Wilson / 2022-03-08
Keywords: Stack Overflow

I've been writing extensions to a little static site generator called Ivy for some of my side projects. At one point I had half a dozen tabs open on Stack Overflow while figuring out the right way to customize code highlighting. The problem wasn't finding an answer: the problem was finding an answer that was up to date. Libraries change over time, but it takes a while for old answers with lots of upvotes to be superceded by newer, more accurate answers.

Zhang2021b looks at this phenomenon quantitatively. I wasn't surprised by their finding that only 20% of obsolete answers are ever updated; I was surprised to learn that more than half of obsolete answers were out of date even when they were first posted.

As evidenced by the figures shown below, this paper contains a wealth of information. And while some of its suggestions (e.g., telling answer seekers to check comments carefully for signs of obsolescence) fall into the same category as, "You really should eat more vegetables," many of its proposals are actionable: for example, the authors point out that many obsolete answers could be detected as they are being typed in. If our live event in April goes well, I hope we will have a second session later in the year, and I hope that these results will be featured then.

Figure 4

Figure 8

Figure 10

Zhang2021b Haoxiang Zhang, Shaowei Wang, Tse-Hsun Chen, Ying Zou, and Ahmed E. Hassan. An empirical study of obsolete answers on stack overflow. IEEE Transactions on Software Engineering, 47(4):850–862, 2021, doi:10.1109/tse.2019.2906315.

Stack Overflow accumulates an enormous amount of software engineering knowledge. However, as time passes, certain knowledge in answers may become obsolete. Such obsolete answers, if not identified or documented clearly, may mislead answer seekers and cause unexpected problems (e.g., using an out-dated security protocol). In this paper, we investigate how the knowledge in answers becomes obsolete and identify the characteristics of such obsolete answers. We find that: 1) More than half of the obsolete answers (58.4%) were probably already obsolete when they were first posted. 2) When an obsolete answer is observed, only a small proportion (20.5%) of such answers are ever updated. 3) Answers to questions in certain tags (e.g., node.js, ajax, android, and objective-c) are more likely to become obsolete. Our findings suggest that Stack Overflow should develop mechanisms to encourage the whole community to maintain answers (to avoid obsolete answers) and answer seekers are encouraged to carefully go through all information (e.g., comments) in answer threads.