the Marc Hauser affair means we need to require more posting of original data

First were the Climategate emails. There, Lack of transparency in climate data analyses and climate models contributed to the doubts of skeptics regarding climate change, and made it easier for the skeptics to convince the public that there is good reason for skepticism.

Now, the Marc Hauser affair has cast a shadow across another sub-area of science.

How can we prevent these scientific fiascos from occurring in the future? We can’t, but a little reform can reduce them. We should shift publication practices in a way that deters fraud and sloppy data analysis and at the same time increase transparency. This will improve everyone’s confidence in reported results.

The change I’m suggesting (and I am by no means the first to advocate for this
) is to require authors to post original data on the web, e.g. in the digital repositories provided by many institutions and consortiums. In the case of Marc Hauser’s work for example, there’s no reason why the original videotapes (the analysis of which was apparently one of the main disagreements between him and others) could not have been published online at the time of paper publication and provided at the time of submission. Of course, the particular data or numbers that are actually appropriate to post varies widely across fields (depending on whether the authors might have a right to further mine the data for further publications, human data privacy considerations, etc.).

In many areas posting the data, especially in a way that makes it interpretable by others, would be a lot of work for authors. However, it would be worth it—reducing fraud in science and increasing public confidence is worth a lot! Note that even in areas where scientists could ‘fake’ the data (as some small percentage of scientists always will consider doing, see the results from anonymous surveys of scientists mentioned here, requiring postings of raw numbers will be a deterrent. Because making up raw numbers wholesale is really going very far to outright unvarnished cheating, and even when people do it, they often make up numbers with systematic biases that can be detected by algorithms designed to detect accounting fraud.

At PLoS ONE we were already discussing strengthening the guidelines for publishing data, and hopefully recent events will help push the journal to do it. But this norm needs to be built across many journals and institutions, to make it part of the culture of science generally.


One thought on "the Marc Hauser affair means we need to require more posting of original data

  1. […] a comment » I have argued before that scientists must do more to make available the original data behind their […]

