Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results

Reviewed by Greg Wilson / 2012-05-18
Keywords: Reproducibility

Wicherts2011 Jelte M. Wicherts, Marjan Bakker, and Dylan Molenaar: "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results". PLoS ONE, 6(11), 2011, 10.1371/journal.pone.0026828.

Background: The widespread reluctance to share published research data is often hypothesized to be due to the authors' fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically.

Methods and Findings: We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance.

Conclusions: Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies.

There has been a lot of debate in recent years about how open science should be. Should people be required to share data? Should they be required to share code? And if so, how early in the research-to-publication process, and what restrictions should be put on re-use? Most of the arguments have been based on principle, but this work by Wicherts et al provides a powerful empirical basis for the "pro" side. Simply put, they found that the more reluctant researchers are to share their data, the more likely it is that their papers contain statistical errors. While correlation doesn't imply causation, it's hard to believe that there isn't a connection. Going forward, we will therefore try to note where the data (and analytic code) used in various papers is publicly available.

« Do Faster Releases Improve Software Quality?

Supporting Professional Spreadsheet Users by Generating Leveled Dataflow Diagrams »