Reproducibility in research: taming a “complex beast”

The ability to reproduce research affects the credibility of science; Prof. John Ioannidis of Stanford points out what’s wrong and how to fix it

John Ioannidis at RFS
John P.A. Ioannidis, MD, DSc, Professor and Co-Director of the Meta-Research Innovation Center at Stanford (METRICS), explains his research on reproducibility as a keynote speaker at Elsevier’s Research Funders Summit. (Photo by Alison Bert)

"Science is a success story … Unfortunately we are too successful."

With that, Dr. John P.A. Ioannidis, Professor and Co-Director of the Meta-Research Innovation Center at Stanford (METRICS), begins his argument for improving the transparency and reproducibility of research. Citing a 2016 paper he co-authored in JAMA, he reveals a striking finding:

Across the biomedical literature, when we text-mined everything that was available since 1990 … 96 percent of those that had p values had statistically significant results. So this is becoming a boring nuisance. Significance and discovery, innovation – it sounds great, but we have too much of that, and only a small fraction of that will reproduce. Only a small fraction of that will eventually translate to something useful. It takes a very long time to do that – an average of 25 years – and even when we get there, it gets refuted by even larger and better studies done downstream.

Therefore there’s a lot of angst about reproducibility.

Prof. Ioannidis made these remarks as a keynote speaker a recent Research Funders Summit presented by Elsevier. Referring to reproducibility as “a complex beast,” he said there is much room for improvement in the reproducibility and usefulness of research in biomedicine and beyond, and he proposed various incentives to improve research practices.

Reproducibility is an area of focus at Elsevier as well.

William Gunn, PhD"Professor Ioannidis is correct that reproducibility is a complex beast, but we're learning more every day about how to tame this beast,” said Dr. William Gunn, Director of Scholarly Communications at Elsevier. “One of the most impactful interventions is Registered Reports, an article format where the experimental proposal is peer reviewed before data collection, helping to align academic practices and values.” This option is now available at hundreds of journals, including many of Elsevier’s, and we encourage our editors to take advantage of this option.

As Prof. Ioannidis pointed out, the ability to reproduce results affects everything from the accuracy and credibility of research to the likelihood of translating it into something useful – and ultimately the public’s trust in science.

A paper Prof. Ioannidis co-authored in JAMA shows that 96% of the biomedical literature published between 1990 and 2015 claims statistically significant results. Prof. Ioannidis suggests redefining “statistical significance.”

Prof. Ioannidis explained that while there is geometric growth in the number of papers using the term “reproducibility,” the phenomenon of reproducibility is “a complex beast.”

There are three kinds of reproducibility, the requirements of which differ from from field to field, he said:

  1. Reproducibility of methods: the ability to understand or repeat as exactly as possible the experimental and computational procedures.
  2. Reproducibility of results: the ability to produce corroborating results in a new study, having followed the same experimental methods.
  3. Reproducibility of inferences: the making of knowledge claims of similar strength from a study replication.

Meanwhile, he said, variations from one field to the next affect what “reproducible research” means:

Different fields have different requirements for claiming that they’re happy with their reproducibility. You cannot have the same degree of determinism in physics versus biology versus social sciences; the signal-to-measurement error ratio will be substantially different; the complexity of design will vary. Whether we use statistics and what kind of statistics we use, whether we even have a hypothesis or its mostly descriptive science. … What are the criteria for truth claims?

And finally, what are we doing with this? Is it high science that’s curiosity driven, and maybe we’ll have an application 30 or 50 or 100 years down the road, or is something that has immediate application to human lives?

To complicate things, he added, there are problems inherent to “small data” studies, where siloed investigators “need to come up with extravagant results” to get renewed funding, and “overpowered” big data studies, where “I have too much data that I don’t know what to do with; they just keep accumulating while I’m sleeping” and “everything I tap will be statistically significant.”

Also, while data sharing is becoming more common, he said, “much of that information is shared without understanding of what exactly is being shared. Normally the recipient doesn’t know what’s being shared, but even the data generator doesn’t know what’s being shared.”

In the extreme, that can lead to exaggerated claims in the media, which undermine the public’s trust in science.

Prof. Ioannidis used this slide as an example of what can go wrong when scientific information is shared without understanding it.

What’s the solution?

Common practices and possible solutions across the workflow for addressing publication biases. Red problems and green solutions are mostly controllable by researchers; purple problems and blue solutions are mostly controllable by journal editors. Funding agencies may also be major players in shaping incentives and reward systems.

In a 2014 paper in Trends in Cognitive Sciences, Prof. Ioannidis and his co-authors suggest research practices that may help increase the proportion of true research findings:

  • Large-scale collaborative research
  • Adoption of replication culture
  • Registration (of studies, protocols, analysis codes, datasets, raw data, and results)
  • Sharing (of data, protocols, materials, software, and other tools)
  • Reproducibility practices
  • Containment of conflicted sponsors and authors
  • More appropriate statistical methods
  • Standardization of definitions and analyses
  • More stringent thresholds for claiming discoveries or ‘‘successes’’
  • Improvement of study design standards
  • Improvements in peer review, reporting, and dissemination of research
  • Better training of scientific workforce in methods and statistical literacy

For training, Prof. Ioannidis suggested funding agencies step in:

Who’s going to teach people statistics and data science and methods? I think that’s a big challenge, but funding agencies could really make a difference by investing a small part of their portfolio to support more such initiatives. Several funding agencies have already started doing that, but I think a bit more would really be catalytic to make sure scientists either train themselves at a level that would be compatible with not being illiterate statistically or at least they know they need to team up with a cognizant data scientist in whatever they do.


Written by

Alison Bert, DMA

Written by

Alison Bert, DMA

As Executive Editor of Strategic Communications at Elsevier, Dr. Alison Bert works with contributors around the world to publish daily stories for the global science and health communities. Previously, she was Editor-in-Chief of Elsevier Connect, which won the 2016 North American Excellence Award for Science & Education.

Alison joined Elsevier in 2007 from the world of journalism, where she was a business reporter and blogger for The Journal News, a Gannett daily newspaper in New York. In the previous century, she was a classical guitarist on the music faculty of Syracuse University. She received a doctorate in music from the University of Arizona, was Fulbright scholar in Spain, and studied in a master class with Andrés Segovia.


comments powered by Disqus