It’s not a great term, the “reproducibility crisis”.
Most don’t think we are actually in a crisis, but I thought that by now practically everyone in various scientific fields had heard of it. In the last few years the reproducibility issue has been covered several times by Nature News, by major newspapers such as the New York Times, and by countless websites, often with truly crisis-level headlines. I first blogged about it in 2012, and it wasn’t new then.
The heart of the issue is that large-scale replication attempts usually fail to reproduce the findings of the original studies. Or at least, they fail to yield the statistically significant finding that the original study did, suggesting either that the original study did not have much statistical power and the original authors got lucky, or it’s a spurious result, or the effect of interest is not very robust to the methodological variation associated with replication attempts. Across preclinical cancer research, economics, and experimental psychology, the results have been similarly depressing. As an editor I’ve been involved in two attempts to replicate individual studies, and both yielded null results (one on ego depletion, and one on how grammatical aspect affects judgments about a criminal).
There are certainly people (Harvard professors, in fact) who still say that “the reproducibility of psychological science is quite high and, in fact, statistically indistinguishable from 100%”. But if you had asked me how many psychology academics believe that it is a significant problem I would have said at least half. Experimental psychology in particular has seen a raft of large-scale replication attempts and very public failures to replicate. Before the Reproducibility Project that attempted to replicate 100 studies, there was Many Labs 1. Now, Many Labs 2 has finished data collection and Many Labs 3 is in process. The conversations around replications have reached near-meme levels of rhetoric. If people in any area should have heard of the reproducibility crisis by now, it’s psychology researchers.
I find out about a lot of replication attempts on twitter, with reproducibility news showing up on my feed on a near-daily basis. I was wondering whether the reproducibility crisis is much of a thing to your average psychology academic. Note: your average psychology academic is not on twitter.
In a seemingly self-defeating effort, I set up a poll on twitter of people not on twitter:
Psychology academics: Ask a colleague who’s not on twitter whether they’ve heard of the “reproducibility crisis”
— Alex Holcombe (@ceptional) April 7, 2016
I was expecting about five responses. Maybe ten. But I got fifty-eight!
Let’s acknowledge that this is an unscientific sample with a lot of selection bias. Who knows how these 58 people got their datum? (I say “datum”, not data, because you can only vote once per twitter account).
We have 26% of people who have never heard of the reproducibility crisis, 40% who are skeptical that it’s a problem, and only 34% who think it’s a major problem.
It could be that people saw this as an entertaining opportunity to troll me. I doubt that and suspect that people actually had a real-world collegial interaction as a result of this tweet. They may have avoided colleagues who they’d previously spoken to about reproducibility. And they may have skipped the nose-to-the-grindstone types, continuing on to a colleague with an open office door. That would be pretty good, but it’s also possible they went straight to the prof they know doesn’t keep up with the times.
With biases in mind, let’s consider the numbers. We’ve got 66%, 38 people, who have either never heard of the crisis or are skeptical that it’s a problem. The 95% confidence interval on that figure (adjusted Wald method) runs from 79% down to 55%. That’s a sobering lower limit.
These people are out there, in significant numbers. We ought to keep this in mind when communicating with people about the latest replication failure or the latest call for publishing reform, be it greater disclosure in methods sections or for preregistration. Many people will continue to see these things as burdensome solutions for a problem that may not exist. Myself, I like to think of preregistration as forcing people to keep their grubby little p-hacking hands from contaminating what could otherwise be a truth-revealing beautiful bit of science.
The positive side is that a very large number of people have become convinced of the value of preregistration in a very short span of time. Already a new experimental psychology journal that publishes only preregistered studies has been created, and at many journals, a new preregistered article type has popped up. Put in perspective, progress has indeed been quite rapid. Consider the molasses-like slog of the open access movement. It took decades of proselytising, explainers, and news for everyone to have a rough idea of what open access is. And still today you find people assuming that the only existing or viable route is author-pays (even though there are thousands of open-access journals that charge authors nothing). Open access is a complex issue, and so are reproducibility issues. It should take a long time to get very far with either.