An executive summary of science’s replication crisis

To evaluate and build on previous findings, a researcher sometimes needs to know exactly what was done before.

Computational reproducibility is the ability to take the raw data from a study and re-analyze it to reproduce the final results, including the statistics.

Empirical reproducibility is demonstrated when, if the study is done again by another team, the critical results reported by the original are found again.

Poor computational reproducibility

Economics Reinhart and Rogoff, two respected Harvard economists, reported in a 2010 paper that growth slows when a country’s debt rises to more than 90% of GDP. Austerity backers in the UK and elsewhere invoked this many times. A postgrad failed to replicate the result, and Reinhart and Rogoff sent him their Excel file. They had unwittingly failed to select the entire list of countries as input to one of their formulas. Fixing this diminished the reported effect, and using a variant of the original method yielded the opposite result than that used to justify billions of dollars’ worth of national budget decisions.

A systematic study found that only about 55% of studies could be reproduced, and that’s only counting studies for which the raw data were available (Vilhuber, 2018).

Cancer biology The Reproducibility Project: Cancer Biology found that for 0% of 51 papers could a full replication protocol be designed with no input from the authors (Errington, 2019).

Not sharing data or analysis code is common. Ioannidis and colleagues (2009) could only reproduce about 2 out of 18 microarray-based gene-expression studies, mostly due to lack of complete data sharing.

Artificial intelligence (machine learning) A survey of reinforcement learning papers found only about 50% included code, and in a study of publications associated with neural net recommender systems, only 40% were found to be reproducible (Barber, 2019).

Poor empirical reproducibility

Wet-lab biology.  Amgen researchers were shocked when they were only able to replicate 11% of 53 landmark studies in oncology and hematology (Begley and Ellis, 2012).

“I explained that we re-did their experiment 50 times and never got their result. He said they’d done it six times and got this result once, but put it in the paper because it made the best story.” Begley

A Bayer team reported that ~25% of published preclinical studies could be validated to the point at which projects could continue (Prinz et al., 2011). Due to poor computational reproducibility and methods sharing, the most careful effort so far (Errington, 2013), of 50 high-impact cancer biology studies, decided only 18 could be fully attempted, and has finished only 14, of which 9 are partial or full successes.

From Maki Naro’s 2016 cartoon.

Social sciences

62% of 21 social science experiments published in Science and Nature between 2010 and 2015 replicated, using samples on average five times bigger than the original studies to increase statistical power (Camerer et al., 2018).

61% of 18 laboratory economics experiments successfully replicated (Camerer et al., 2016).

39% of 100 experimental and correlational psychology studies replicated (Nosek et al.,, 2015).

53% of 51 other psychology studies (Klein et al., 2018; Ebersole et al., 2016; Klein et al. 2014) and ~50% of 176 other psychology studies (Boyce et al., 2023)

Medicine

Trials: Data for >50% never made available, ~50% of outcomes not reported, author’s data lost at ~7%/year (Devito et al, 2020)

I list six of the causes of this sad state of affairs in another post.

References

Barber, G. (n.d.). Artificial Intelligence Confronts a “Reproducibility” Crisis. Wired. Retrieved January 23, 2020, from https://www.wired.com/story/artificial-intelligence-confronts-reproducibility-crisis/

Begley, C. G., & Ellis, L. M. (2012). Raise standards for preclinical cancer research. Nature, 483(7391), 531–533.

Boyce, V., Mathur, M., & Frank, M. C. (2023). Eleven years of student replication projects provide evidence on the correlates of replicability in psychology. PsyArXiv. https://doi.org/10.31234/osf.io/dpyn6

Bush, M., Holcombe, A. O., Wintle, B. C., Fidler, F., & Vazire, S. (2019). Real problem, wrong solution: Why the Nationals shouldn’t politicise the science replication crisis. The Conversation. http://theconversation.com/real-problem-wrong-solution-why-the-nationals-shouldnt-politicise-the-science-replication-crisis-124076

Camerer, C. F., et al.,  (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. https://doi.org/10.1038/s41562-018-0399-z

Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016). DOI: 10.1126/science.aaf0918

DeVito, N. J., Bacon, S., & Goldacre, B. (2020). Compliance with legal requirement to report clinical trial results on ClinicalTrials.gov: A cohort study. The Lancet, 0(0). https://doi.org/10.1016/S0140-6736(19)33220-9

Ferrari Dacrema, Maurizio; Cremonesi, Paolo; Jannach, Dietmar (2019). “Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches”. Proceedings of the 13th ACM Conference on Recommender Systems. ACM: 101–109. doi:10.1145/3298689.3347058. hdl:11311/1108996.

Ebersole, C. R. Et al. (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012

Errington, T. (2019) https://twitter.com/fidlerfm/status/1169723956665806848

Errington, T. M., Iorns, E., Gunn, W., Tan, F. E., Lomax, J., & Nosek, B. A. (2014). An open investigation of the reproducibility of cancer biology research. ELife, 3, e04333. https://doi.org/10.7554/eLife.04333

Errington, T. (2013). https://osf.io/e81xl/wiki/home/

Glasziou, P., et al. (2014). Reducing waste from incomplete or unusable reports of biomedical research. The Lancet, 383(9913), 267–276. https://doi.org/10.1016/S0140-6736(13)62228-X

Ioannidis, J. P. A., Allison, D. B., et al. (2009). Repeatability of published microarray gene expression analyses. Nature Genetics, 41(2), 149–155. https://doi.org/10.1038/ng.295

Klein, R. A., et al. (2018). Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. Advances in Methods and Practices in Psychological Science, 1(4), 443–490. https://doi.org/10.1177/2515245918810225

Klein, R. A., et al. (2014). Investigating Variation in Replicability. Social Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-9335/a000178

Nosek, B. A., Aarts, A. A., Anderson, C. J., Anderson, J. E., Kappes, H. B., & Collaboration, O. S. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716.

Prinz, F., Schlange, T. & Asadullah, K. Nature Rev. Drug Discov. 10, 712 (2011).

Vilhuber, L. (2018). Reproducibility and Replicability in Economics https://www.nap.edu/resource/25303/Reproducibility%20in%20Economics.pdf