It Will Never Work in Theory

Crowd Documentation

Posted Jun 5, 2012 by Jorge Aranda

| Documentation | Mining | Quantitative Studies |

Chris Parnin, Christoph Treude, Lars Grammel, and Margaret-Anne Storey. Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow. Georgia Tech Technical Report, 2012.

Traditionally, many types of software documentation, such as API documentation, require a process where a few people write for many potential users. The resulting documentation, when it exists, is often of poor quality and lacks sufficient examples and explanations. In this paper, we report on an empirical study to investigate how Question and Answer (Q&A) websites, such as Stack Overflow, facilitate crowd documentation — knowledge that is written by many and read by many. We examine the crowd documentation for three popular APIs: Android, GWT, and the Java programming language. We collect usage data using Google Code Search, and analyze the coverage, quality, and dynamics of the Stack Overflow documentation for these APIs. We find that the crowd is capable of generating a rich source of content with code examples and discussion that is actively viewed and used by many more developers. For example, over 35,000 developers contributed questions and answers about the Android API, covering 87% of the classes. This content has been viewed over 70 million times to date. However, there are shortcomings with crowd documentation, which we identify. In addition to our empirical study, we present future directions and tools that can be leveraged by other researchers and software designers for performing API analytics and mining of crowd documentation.

The process of figuring out how to use an API has changed radically since Q&A sites (and in particular StackOverflow) came along. But to what extent can we depend on such sites for complete, speedy documentation? Parnin and colleagues looked into this, and got some pretty interesting stats (a sample: 87% of all classes of the Android API and 77% of the Java API classes have at least one thread at StackOverflow; questions are answered in a median time of 11 minutes), and some visualization tools for you to play with the data: see Chris Parnin's blog for more details.

(Disclaimer: I'm currently associated with Dr. Storey's lab. However, I did not participate in this research.)

Comments powered by Disqus