Enriching API Documentation with Code Samples and Usage Scenarios

Reviewed by Sakib Hasan / 2021-10-26
Keywords: Crowdsourcing, Documentation

If you are a programmer, trying to find an Application Programming Interface (API) code sample on Stack Overflow to solve a problem won't be a novel activity. But why is searching more popular than consulting the official documentations of the APIs? Simply put, that documentation often only has the API's definitions, but lacks usage scenarios and a relevant implemented example.

Zhang2021a attempts to mitigate this lack by introducing ADECK, an algorithm to provide usage scenario enriched API documentation by mining a crowdsourcing platforms. ADECK mines different API classes from their official documentation, scrapes Stack Overflow data to filter out Questions (Usage-Scenario) and Answer (Code-Sample) pairs based on the mined API classes, and then clusters the similar Question-Answer pairs based on use-cases. The algorithm eventually builds an enriched API documentation from these clustered pairs.

The authors evaluated ADECK with graduate student subjects and compared the results against an algorithm called eXoaDocs with the same objective Kim2013b. They found that:

  • The number of API types illustrated with code samples in the documentation produced by ADECK is much higher than the number in raw documentation for Java SE and Android AP.
  • The code samples collected by ADECK are more concise, correct, and usable than those collected by eXoaDocs.
  • Users are more productive with the ADECK-enriched documentation than with the raw and eXoaDoc documentation.

ADECK is trying to shift the focus from consulting crowdsourced Q&A platforms to documenting API use-cases with the help of those platforms. While it is not yet ready for real-world use, the findings of the study show how promising this approach could be.

Zhang2021a Jingxuan Zhang, He Jiang, Zhilei Ren, Tao Zhang, and Zhiqiu Huang: "Enriching API Documentation with Code Samples and Usage Scenarios from Crowd Knowledge". IEEE Transactions on Software Engineering, 47(6), 2021, 10.1109/tse.2019.2919304.

As one key resource to learn Application Programming Interfaces (APIs), a lot of API reference documentation lacks code samples with usage scenarios, thus heavily hindering developers from programming with APIs. Although researchers have investigated how to enrich API documentation with code samples from general code search engines, two main challenges remain to be resolved, including the quality challenge of acquiring high-quality code samples and the mapping challenge of matching code samples to usage scenarios. In this study, we propose a novel approach named ADECK towards enriching API documentation with code samples and corresponding usage scenarios by leveraging crowd knowledge from Stack Overflow, a popular technical Question and Answer (Q&A) website attracting millions of developers. Given an API related Q&A pair, a code sample in the answer is extensively evaluated by developers and targeted towards resolving the question under the specified usage scenario. Hence, ADECK can obtain high-quality code samples and map them to corresponding usage scenarios to address the above challenges. Extensive experiments on the Java SE and Android API documentation show that the number of code-sample-illustrated API types in the ADECK-enriched API documentation is 3.35 and 5.76 times as many as that in the raw API documentation. Meanwhile, the quality of code samples obtained by ADECK is better than that of code samples by the baseline approach eXoaDocs in terms of correctness, conciseness, and usability, e.g., the average correctness values of representative code samples obtained by ADECK and eXoaDocs are 4.26 and 3.28 on a 5-point scale in the enriched Java SE API documentation. In addition, an empirical study investigating the impacts of different types of API documentation on the productivity of developers shows that, compared against the raw and the eXoaDocs-enriched API documentation, the ADECK-enriched API documentation can help developers complete 23.81 and 14.29 percent more programming tasks and reduce the average completion time by 9.43 and 11.03 percent.