Accurate statistical reporting is an essential driver of scientific progress in many fields. The calibre of scientific literature depends on the strength of its statistical conclusions. This is what informs reviewers and editors to make editorial decisions and serves as a basis for other researchers to build upon. When things go wrong, however, the impact can be considerable. Some errors may cause a seemingly significant result to become insignificant and vice versa. Even small statistical errors can lead to problems, particularly now that more knowledge is being aggregated through meta-analyses.
Dr. Michèle Nuijten and colleagues at the University of Tilburg and the University of Amsterdam decided it was time to check how accurate our scientific reporting has been in the field of psychology. To accomplish this, they created an automated program to check p -values of published papers: statcheck . They tested 250,000 p-values published in 8 psychology journals from 1985 until 2013 and found that half of those contained errors of varying degrees (Nuijten et al, 2015).
Statcheck is a web-based program that checks the coherence of statistical significance tests, including the p-value, that are reported in a standard format. For any significance test there are three numbers to report, any one of which depends on the other two. Statcheck makes sure that all three numbers line up . If they don’t, the problem is flagged to the user.
Some journals have since started using statcheck to screen submitted papers in a similar way to plagiarism software. The result is then assessed by an editor and any issues are addressed with the author(s) if necessary. One of those journals is the Journal of Experimental Social Psychology, edited by Professor Roger Giner-Sorolla at the University of Kent. We asked Roger about his experience using Statcheck so far to get a better sense of how the program is affecting the publication process and how we might be able to use this on other journals and subject areas in the future.
Why did you decide to pilot statcheck on Journal of Experimental Psychology?
It is better to have correct than incorrect statistics, as much as we can help it. Errors are embarrassing and they pop up more often than people think.
Is it working so far?
I think so -- we have sent back many papers for correction that made incorrect conclusions and, although we have no way of knowing how many authors are using it on their own, the number of errors seems to have fallen since we first introduced it. I would say about half of all true error flags are either trivial errors (by .01 or so, and arguably due to the rounding that occurs with statistical software), correct reporting of one-tailed tests (which statcheck also helps identify), or other issues that turn out on inspection not to be problems. The other half are worth correction, especially in the not very rare case where the discrepancy crosses a threshold of significance.
Do you think this will become as normal as plagiarism software in future?
I think so, especially if there is a way to integrate it seamlessly within the editorial system structure instead of having to feed manuscripts through a third-party site as we do now. Using the site and scrolling down looking for errors takes less than half a minute. But if errors are found, judging them and what they mean takes a bit longer. Even for the most error-ridden papers, using statcheck has never taken more than 5 minutes.
What effect do you hope statcheck will have within the scholarly community?
It will reward the people who have been diligently double-checking their figures all these years, and give an easy way for everyone else to spot obvious errors they have made.
There has been a lot of talk recently about how we measure statistical significance, what do you think is the future of the p-value?
I think the p-value has a good future, but it will be less and less central to scholarly decisions, as people realize you have to look at other things that are just as important.
Looking forward, where does Statcheck go from here?
Hoping that statcheck will expand to catch more kinds of statistical errors!
At Elsevier we intend to explore how we can start using statcheck on more journals in psychology and in other disciplines. Statcheck is a free tool which authors can upload their own papers to if they wish. You can find more information about statcheck and the Journal of Experimental Social Psychology here.
Nuijten M., Hartgerink C., van Assen A., Epskamp S., Wicherts J. (2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods. 48(4),1205-1226. Doi: 10.3758/s13428-015-0664-2