March 30, 2024

Committing research fraud is easy. Let’s make it harder.

originally published by The Chronicle of Higher Education as “How to Stop Academic Fraudsters” (I didn’t choose that title)

“Hi Alex, this is not credible.”

I’ll never forget that email. It was 2016, and I had been helping psychology researchers design studies that, I hoped, would replicate important and previously published findings. As part of a replication-study initiative that I and the other editors had set up at the journal Perspectives on Psychological Science, dozens of labs around the world would collect new data to provide a much larger dataset than that of the original studies.

With the replication crisis in full swing, we knew that data dredging and other inappropriate research practices meant that some of the original studies were unlikely to replicate. But we also thought our wide-scale replication effort would confirm some important findings. Upon receiving the “this is not credible” message, however, I began to be haunted by another possibility — that at least one of those landmark studies was a fraud.

The study in question was reminiscent of many published in high-impact journals in the mid-2010s. It indicated that people’s mood or behavior could be shifted a surprising amount by a subtle manipulation. The study had found that people became happier when they described a previous positive experience in a verb tense suggesting an ongoing experience — rather than one set firmly in the past. Unfortunately for psychology’s reputation, social-priming studies like that had been falling like a house of cards, and our replication failed, too. In response, the researchers behind the original study submitted a new experiment that appeared to shore up their original findings. With their commentary, the researchers provided the raw data for the new study, which was unusual at the time, but it was our policy to require it. This was critical to what happened next.

One scholar involved in the replication attempt had a close look at the Excel spreadsheet containing the new data. The spreadsheet had nearly 200 rows, one for each person who had supposedly participated in the experiment. But the responses of around 70 of them appeared to be exact duplicates of other people in the dataset. When the duplicates were removed, the main result was no longer statistically significant.

After thanking the scholar who had caught the problem, I pointed out the data duplication to the researchers behind the original study. They apologized for what they described as an innocent data-processing mistake. Then, rather conveniently, they discovered some additional data they said they had accidentally omitted. With that data added in, the result was statistically significant again. By this point, the scholar who had caught the duplication had had enough. The new data, and possibly the old, were no longer credible.

I conducted my own investigation of the Excel data. I confirmed the irregularities and found even more inconsistencies when I examined the raw data exactly as downloaded from the online service used to run the study. The other journal editors and I still didn’t believe that the reason for the irregularities was fraud — all along, the researchers behind the original study had seemed very nice and were very obliging about our data requests — but we decided that we shouldn’t publish the commentary that accompanied the questionable new data. We also reported them to their university’s research-integrity office. After an investigation, the university found that the data associated with the original study had been altered in strategic ways by a graduate student who had also produced the data for the new study. The case was closed, and the paper was retracted, but the cost had been substantial, involving thousands of hours of work by dozens of people involved in the replication, the university investigators, and at least one harried journal editor (me).

More recently, two high-profile psychology researchers, Francesca Gino of Harvard and Dan Ariely of Duke, faced questions about their published findings. The data in Excel files they have provided show patterns that seem unlikely to have occurred without inappropriate manipulation of the numbers. Indeed, one of Ariely’s Excel files contains signs of the sort of data duplication that occurred with the project I handled back in 2016.

Ariely and Gino both maintain that they never engaged in any research misconduct. They have suggested that unidentified others among their collaborators are at fault. Well, wouldn’t it be nice, for them and for all of us, if they could prove their innocence? For now, a cloud of suspicion hangs over both them and their co-authors. As the news has spread and the questions have remained unresolved, the cloud has grown to encompass other papers that Ariely and Gino were involved in, for which clear data records have not yet been produced. Perhaps as much to defend their own reputations as to clean up the scientific record, Gino’s collaborators have launched a project to forensically examine more than 100 of the papers that she has co-authored. This vast reallocation of academic expertise and university resources could, in a better system, be avoided.

How? Researchers need a record-keeping system that indicates who did what and when. I have been using Git to do this for more than a decade. The standard tool of professional software developers, Git allows me to manage my psychology-experiment code, analysis code, and data, and provides a complete digital paper trail. When I run an experiment, the data are recorded with information about the date, time, and host computer. The lines of code I write in R to do my analysis are also logged. An associated website, GitHub, stores all of those records and allows anyone to see them. If someone else in my lab contributes data or analysis, they and their contributions are also logged. Sometimes I even write up the resulting paper through this system, embedding analysis code within it, with every data point and statistic in the final manuscript traceable back to its origin.

My system is not 100 percent secure, but it does make research misconduct much more difficult. Deleting inconvenient data points would be detectable. Moreover, if trusted timestamping is used, the log of file changes is practically unimpeachable. Git is not easy to learn, but the basic concept of “version history” is today part of Microsoft Word, Google Docs, and other popular software and systems. Colleges and universities should ensure that whatever software their researchers use keep good records of what the researchers do with their files.

While enabling more recording of version history would be only a small step, it could go a long way. The Excel files that Gino and Ariely have provided have little to no embedded records indicating what changes were made and when. That’s not surprising — their Excel files were created years ago, before Excel could record a version history. Even today, however, with its default setting, Excel deletes from its record any changes older than 30 days. Higher-ed institutions should set their enterprise Excel installations to never delete their version histories. This should also be done for other software that researchers commonly use.

Forensic data sleuthing has found that a worrying number of papers published today contain major errors, if not outright fraud. When the anesthesiologist John Carlisle scrutinized work submitted to the journal he edited, Anaesthesia, he found that of 526 submitted trials, 73 (14 percent) had what seemed to be false data, and 43 (8 percent) were so flawed they would probably be retracted if their data flaws became public (he termed these “zombie” trials). Carlisle’s findings suggest that the literature in some fields is rapidly becoming littered with erroneous and even falsified results. Fortunately, the same record-keeping that allows one to conduct an audit in cases of fraud can also help colleges, universities, journals, and researchers prevent errors in the first place.

Errors will always occur, but they are less likely to cause long-lasting damage if someone can check for them, whether that’s a conscientious member of the research team, a reviewer, or another researcher interested in the published paper. To better check the chain of calculations associated with a scientific claim, more researchers should be writing their articles in a system that can embed code, so that the calculations behind each statistic and point on a plot can be checked. These are sometimes called “executable articles” because pressing a button executes code that can use the original data to regenerate the statistics and figures.

Scholars don’t need to develop such systems from scratch. A number of services have sprung up to help those of us who are not seasoned programmers. A cloud service called Code Ocean facilitates the creation of executable papers, preserving the software environment originally used so that the code still executes years later. Websites called Overleaf and Authorea help researchers create such documents collaboratively rather than leaving it all on one researcher’s computer. The biology journal eLife has used a technology called Stencila to permit researchers to write executable papers with live code, allowing a paper’s readers to adjust the parameters of an analysis or simulation and see how that changes its results.

Universities and colleges, in contrast, have generally done very little to address fraud and errors. When I was a Ph.D. student in psychology at Harvard, there were two professors on the faculty who were later accused of fraud. One of them owned up to the fraud and helped get her work retracted. The other, Marc Hauser, “lawyered up” and fought the accusations, but nevertheless he was found by Harvard to have committed scientific misconduct (the U.S. Office of Research Integrity also found him to have fabricated data).

As a result, Harvard had more than a decade after the findings of serious fraud by two of its faculty members to prepare for, and try to prevent, future misconduct. When news of the Gino scandal broke, I was shocked to learn how little Harvard seemed to have improved its policies. Indeed, Harvard scrambled to rewrite its misconduct policies in the wake of the new allegations, opening up the university to accusations of unfair process, and to Gino’s $25-million lawsuit.

The problems go well beyond Harvard or Duke or even the field of psychology. Not long after John Carlisle reported his alarming findings from clinical-trial datasets in anesthesiology, a longtime editor of the prestigious BMJ (formerly the British Medical Journal) suggested that it was time to assume health research is fraudulent until proven otherwise. Today, a number of signs suggest that the problems have only worsened.

Marc Tessier-Lavigne is a prominent neuroscientist and was, until recently, president of Stanford University. He had to resign after evidence emerged of “apparent manipulation of research data by others” in several papers that came from his lab — but not until after many months of dogged reporting by the Stanford student newspaper. Elsewhere in the Golden State, the University of Southern California is investigating the star neuroscientist Berislav Zlokovic over accusations of doctored data in dozens of papers, some of which led to drug trials in progress.

In biology labs like those of Tessier-Lavigne and Zlokovic, the data associated with a scientific paper often include not only numbers but also images from gel electrophoresis or microscopy. An end-to-end chain of certified data provenance there presents a greater challenge than in psychology, where everything involved in an experiment may be in the domain of software. To chronicle a study, laboratory machines and microscopes need to record data in a usable, timestamped format, and must be linked into an easy-to-follow laboratory notebook.

If we want science to be something that society can still trust, we must embrace good data management. The $25 million that Harvard could lose to Gino — while a mere drop in the operating budget — would go far if spent on developing good data-management systems and training researchers in their use. The reputational returns to Harvard, to its scholars, and to academic science in general would repay the investment many times over. It’s time to stop pretending academic fraud isn’t a problem, and to do something about it.

March 28, 2024

Gates Foundation, and me: Mandate preprints, support peer review services outside of big publishers

Today the Gates Foundation announced that they will “cease support for individual article publishing fees, known as APCs, and mandate the use of preprints while advocating for their review”. I am excited by this news because over the last couple decades, it’s been disheartening to see large funders continue to pour money down the throats of high-profit multinational publishers .

In their announcement, the Gates Foundation has recommendations for research funders that include the following:

Invest funding into models that benefit the whole ecosystem and not individual funded researchers.

They also state that funders, and researchers, should support innovative initiatives that facilitate peer review and curation separately from traditional publication.

Diamond OA journals, which are free to authors as well as readers, clearly fit the bill, as well as journal-independent review services such as Peer Community In, PreReview, and COAR-Notify. I’m an (unpaid) advisory board member of the Free Journal Network, which supports (and does some light vetting of) diamond OA journals. I’m also an associate editor at the free WikiJournal of Science, Meta-Psychology, and the coming Meta-ROR metascience peer review platform. All of these initiatives are oriented around providing free peer review of preprints.

Such initiatives have had trouble attracting funding, as have preprint servers, despite the enormous benefit preprint servers have provided of rapid dissemination of research; much faster than through journals.

Because of how agreements like Germany’s DEAL (and Australia’s planned deal) facilitate publisher lock-in, my favorite episode in the history of such negotiations is the extended periods when German and Californian universities did not have access to Elsevier publications, pushing them away from Elsevier rather than toward it. As Björn Brembs and I wrote in 2017, the best DEAL is no deal. When funders have an agreement with them, researchers are unfortunately pushed toward high-profit, progress-undermining publishers like Elsevier as in that case publishing with Elsevier is free, while it may not be with more progressive and lower-cost publishers. And as an Australian colleague was quoted saying, the proposed agreement with Elsevier would “enshrine a national debt to wealthy international publishers, who were likely to tack on hefty increases once an agreement was reached.”

August 8, 2023

An executive summary of science’s replication crisis

To evaluate and build on previous findings, a researcher sometimes needs to know exactly what was done before.

Computational reproducibility is the ability to take the raw data from a study and re-analyze it to reproduce the final results, including the statistics.

Empirical reproducibility is demonstrated when, if the study is done again by another team, the critical results reported by the original are found again.

Poor computational reproducibility

Economics Reinhart and Rogoff, two respected Harvard economists, reported in a 2010 paper that growth slows when a country’s debt rises to more than 90% of GDP. Austerity backers in the UK and elsewhere invoked this many times. A postgrad failed to replicate the result, and Reinhart and Rogoff sent him their Excel file. They had unwittingly failed to select the entire list of countries as input to one of their formulas. Fixing this diminished the reported effect, and using a variant of the original method yielded the opposite result than that used to justify billions of dollars’ worth of national budget decisions.

A systematic study found that only about 55% of studies could be reproduced, and that’s only counting studies for which the raw data were available (Vilhuber, 2018).

Cancer biology The Reproducibility Project: Cancer Biology found that for 0% of 51 papers could a full replication protocol be designed with no input from the authors (Errington, 2019).

Not sharing data or analysis code is common. Ioannidis and colleagues (2009) could only reproduce about 2 out of 18 microarray-based gene-expression studies, mostly due to lack of complete data sharing.

Artificial intelligence (machine learning) A survey of reinforcement learning papers found only about 50% included code, and in a study of publications associated with neural net recommender systems, only 40% were found to be reproducible (Barber, 2019).

Poor empirical reproducibility

Wet-lab biology. Amgen researchers were shocked when they were only able to replicate 11% of 53 landmark studies in oncology and hematology (Begley and Ellis, 2012).

“I explained that we re-did their experiment 50 times and never got their result. He said they’d done it six times and got this result once, but put it in the paper because it made the best story.” – Begley

A Bayer team reported that ~25% of published preclinical studies could be validated to the point at which projects could continue (Prinz et al., 2011). Due to poor computational reproducibility and methods sharing, the most careful effort so far (Errington, 2013), of 50 high-impact cancer biology studies, decided only 18 could be fully attempted, and has finished only 14, of which 9 are partial or full successes.

Social sciences

62% of 21 social science experiments published in Science and Nature between 2010 and 2015 replicated, using samples on average five times bigger than the original studies to increase statistical power (Camerer et al., 2018).

61% of 18 laboratory economics experiments successfully replicated (Camerer et al., 2016).

39% of 100 experimental and correlational psychology studies replicated (Nosek et al.,, 2015).

53% of 51 other psychology studies (Klein et al., 2018; Ebersole et al., 2016; Klein et al. 2014) and ~50% of 176 other psychology studies (Boyce et al., 2023)

Medicine

Trials: Data for >50% never made available, ~50% of outcomes not reported, author’s data lost at ~7%/year (Devito et al, 2020)

I list six of the causes of this sad state of affairs in another post.

References

Barber, G. (n.d.). Artificial Intelligence Confronts a “Reproducibility” Crisis. Wired. Retrieved January 23, 2020, from https://www.wired.com/story/artificial-intelligence-confronts-reproducibility-crisis/

Begley, C. G., & Ellis, L. M. (2012). Raise standards for preclinical cancer research. Nature, 483(7391), 531–533.

Boyce, V., Mathur, M., & Frank, M. C. (2023). Eleven years of student replication projects provide evidence on the correlates of replicability in psychology. PsyArXiv. https://doi.org/10.31234/osf.io/dpyn6

Bush, M., Holcombe, A. O., Wintle, B. C., Fidler, F., & Vazire, S. (2019). Real problem, wrong solution: Why the Nationals shouldn’t politicise the science replication crisis. The Conversation. http://theconversation.com/real-problem-wrong-solution-why-the-nationals-shouldnt-politicise-the-science-replication-crisis-124076

Camerer, C. F., et al., (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. https://doi.org/10.1038/s41562-018-0399-z

Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016). DOI: 10.1126/science.aaf0918

DeVito, N. J., Bacon, S., & Goldacre, B. (2020). Compliance with legal requirement to report clinical trial results on ClinicalTrials.gov: A cohort study. The Lancet, 0(0). https://doi.org/10.1016/S0140-6736(19)33220-9

Ferrari Dacrema, Maurizio; Cremonesi, Paolo; Jannach, Dietmar (2019). “Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches”. Proceedings of the 13th ACM Conference on Recommender Systems. ACM: 101–109. doi:10.1145/3298689.3347058. hdl:11311/1108996.

Ebersole, C. R. Et al. (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012

Errington, T. (2019) https://twitter.com/fidlerfm/status/1169723956665806848

Errington, T. M., Iorns, E., Gunn, W., Tan, F. E., Lomax, J., & Nosek, B. A. (2014). An open investigation of the reproducibility of cancer biology research. ELife, 3, e04333. https://doi.org/10.7554/eLife.04333

Errington, T. (2013). https://osf.io/e81xl/wiki/home/

Glasziou, P., et al. (2014). Reducing waste from incomplete or unusable reports of biomedical research. The Lancet, 383(9913), 267–276. https://doi.org/10.1016/S0140-6736(13)62228-X

Ioannidis, J. P. A., Allison, D. B., et al. (2009). Repeatability of published microarray gene expression analyses. Nature Genetics, 41(2), 149–155. https://doi.org/10.1038/ng.295

Klein, R. A., et al. (2018). Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. Advances in Methods and Practices in Psychological Science, 1(4), 443–490. https://doi.org/10.1177/2515245918810225

Klein, R. A., et al. (2014). Investigating Variation in Replicability. Social Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-9335/a000178

Nosek, B. A., Aarts, A. A., Anderson, C. J., Anderson, J. E., Kappes, H. B., & Collaboration, O. S. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716.

Prinz, F., Schlange, T. & Asadullah, K. Nature Rev. Drug Discov. 10, 712 (2011).

Vilhuber, L. (2018). Reproducibility and Replicability in Economics https://www.nap.edu/resource/25303/Reproducibility%20in%20Economics.pdf

August 4, 2023

A legacy of skepticism and universalism

Many of the practices associated with modern science emerged in the early days of the Royal Society of London for Improving Natural Knowledge, which was founded in 1660. Today, it is usually referred to as simply “the Royal Society”. When the Royal Society chose a coat of arms, they included the words Nullius in verba:

Nullius in verba is usually taken to mean “Take nobody’s word for it”, which was a big departure from tradition. People previously had mostly been told to take certain things completely on faith, such as the proclamations of the clergy and even the writings of Aristotle.

In the early 1600s, René Descartes had written a book urging people to be skeptical of what others claim, no matter who they are.

Rene Descartes. Image: public domain.

This caught on in France, even among the public — many people started referring to themselves as Cartesians. Meanwhile in Britain, the ideas of Francis Bacon were becoming influential. His skepticism was less radical than Descartes’, and included many practical suggestions for how knowledge could be advanced.

Francis Bacon in 1616. Image:public domain.

Bacon’s mix of skepticism with optimism about advancing knowledge using observation led, in 1660, to the founding in London of “a Colledge for the Promoting of Physico-Mathematicall Experimentall Learning”. This became the Royal Society.

The combination of skepticism and the opening up of knowledge advancement to contemporary people, not just traditional authorities, set the stage for the success of modern science. When multiple skeptical researchers take a close look at the evidence behind a new claim and are unable to find major problems with the evidence, everyone can then be more confident in the claim. As the historian David Wootton has put it, “What marks out modern science is not the conduct of experiments”, but rather “the formation of a critical community capable of assessing discoveries and replicating results.”

Taking the disregard of traditional authority further, in the 20th century the sociologist Robert Merton suggested that scientists value universalism. By universalism, Merton meant that in science, claims are evaluated without regard to the sort of person providing the evidence. Evidence is evaluated by scientists, Merton wrote, based on “pre-established impersonal criteria”.

Universalism provides a vision of science that is egalitarian, and universalism is endorsed by large majorities of today’s scientists. However, those who endorse it don’t always follow it in practice. Scientific organizations such as the Royal Society can be elitist. For example, sometimes the scholarly journals that societies publish treat reports by famous researchers with greater deference than those by other researchers.

Placing some trust in authorities (such as famous researchers) is almost unavoidable, because in life we have to make decisions about what to do even when we can’t be completely certain of the facts. In such situations, it can be appropriate to “trust” authorities, believing their proclamations. We don’t have the resources to assess all kinds of scientific evidence ourselves, so we have to look to those who seem to have a track record of making well-justified claims in a particular area. But when it comes to the development of new, cutting-edge knowledge, science thrives on the skepticism that drives the behavior of some researchers.

Together, the values of communalism, skepticism, and a mixture of universalism and elitism shaped the growth of scientific institutions, including the main way in which researchers officially communicated their findings: through academic journals.

July 23, 2023

The Replication Crisis: the Six P’s

In a clever bit of rhetoric, Professor Dorothy Bishop came up with “the four horsemen of irreproducibility“: publication bias, low statistical power, p-hacking, and HARKing. In an attempt at more complete coverage of the causes of the replication crisis, here I’m expanding on Dorothy’s four horsemen by adding two more causes, and using different wording. This gives me six P’s of the replication crisis! Not super-catchy, but I think this is useful.

Four horsemen of the apocalypse — Four Horsemen of the Apocalypse, by Eduard Jakob von Steinle

1. For me, P-hacking was always the first thing that came to mind as a reason that many published results don’t replicate. Ideally, when there is nothing to be found in a comparison (such as no real difference between two groups), with the p=0.05 criterion used in many sciences, only 5% of studies will yield a false positive result. However, researchers hoping for a result will try all sorts of analyses to get the p-value to be less than .05, partly because that makes the result much easier to publish. This is p-hacking, and it can greatly elevate the rate of false positives in the literature.

Substantial proportions of psychologists, criminologists, applied linguists and other sorts of researchers admit to p-hacking. Nevertheless, p-hacking may be responsible for only a minority of the failures to successfully replicate previous results. Three of the other p’s below also contribute to the rate of false positives, and while researchers have tried, it’s very hard to sort out their relative importance.

2. Prevarication, which means lying, unfortunately is responsible for some proportion of the positive but false results in the literature. How important is it? Well, that’s very difficult to estimate. Within a psychology laboratory, it is possible to arrange things so that one can measure the rate at which people lie, for example to win additional money in a study, so that helps, but some of the most famous researchers to do so have, well, lied about their findings. And we know that fraudsters work in many research areas, not just dishonesty research. In some areas of human endeavor, regular audits are conducted – but not in science.

3. Publication bias is the tendency of researchers to only publish findings that they find interesting, that were statistically significant, or that confirmed what they expected based on their theoretical perspective. This has resulted in a colossal distortion of reality in some fields, to favor researchers’ pet theories, and resulted in lots of papers about all sorts of phenomena that may not actually exist. Anecdotally, I have heard about psychology laboratories that used to run a dozen studies every semester and only publish the ones that yielded statistically significant results. For those areas where researchers are always testing for something that truly exists (are there any such fields?), publication bias results in inflated estimates of its size.

4. Low statistical power. Most studies in psychology and neuroscience are underpowered, so even if the hypotheses being investigated are true, the chance that any particular study will yield statistically significant evidence for those hypotheses is small. Thus, researchers are used to studies not working, but to get a publication, they know they need a statistically significant result. This can drive them toward publication bias, as well as p-hacking. It also means that attempts to replicate published results often don’t yield a significant result even when the original result is real, making it difficult to resolve the uncertainty about what is real and what is not.

5. A particularly perverse practice that has developed in many sciences is pretending you predicted the results in advance. Also known as HARKing, this gives readers a much higher confidence in published phenomena and theories that they deserve. Infamously, the psychologist Daryl Bem gave students and fellow researchers the following advice:

There are two possible articles you can write: (1) the article you planned to write when you designed your study or (2) the article that makes the most sense now that you have seen the results. They are rarely the same, and the correct answer is (2).

If one follows this advice, with every study the goalpost is moved to match the interesting aspects of the data, even though pure chance is often the only cause of those interesting findings. It’s practices like this, together with publication bias and p-hacking, that are believed to be responsible for Bem’s apparent discovery that ESP is real, which he published in a prestigious social psychology journal.

6. Even when a scientific result reflects a true phenomenon rather than being spurious, it can be difficult to for subsequent researchers to replicate that result. We already ran into this above with the fact that most published studies have low statistical power. Another factor is poor reporting practices (yes, I’m counting this as another ‘p’!). In their papers, researchers often do not describe their study in enough detail for other researchers to be able to duplicate what was done. For example, the Reproducibility Project: Cancer Biology initially aimed to replicate 193 experiments, but none of the experiments were described in sufficient detail in the original paper to enable the researchers to design protocols to repeat the experiments, and for 32% of the associated papers, the authors never responded to inquiries or declined to share reagents, code, or data.

The six P’s don’t exhaust the reasons for poor reproducibility. Simple errors, for example, are another cause, and such errors are surely committed both by original researchers and by replicating researchers (although replication studies seem to be held to a higher standard by journal editors and reviewers than are original studies).

Many steps have been suggested to improve the dire situation that the 6 P’s (and more) have led to. At the most relevant places for science, however, such as journals and universities, these measures are often ignored or adopted only grudgingly, so there remains a long way to go.

July 13, 2020

Tenzing, a web app for contributorship

Blogged about this web app we created to help researchers document who did what in their journal articles, posted over at Medium for a change.

September 24, 2019

Introduction to reproducibility

A brief intro for research students.

Good science is cumulative

Scientists seek to create new knowledge, often by conducting an experiment or other research study.

But science is much more than doing studies and analyzing the data. Critical to the scientific enterprise is communication of what was done, and what was found.

GodfreyKneller-IsaacNewton-1689

Isaac Newton, who formulated the laws of motion and gravity, wrote that “If I have seen further it is by standing on the shoulders of giants.” Newton knew that science is cumulative – we build on the findings of previous researchers.

Robert Merton, a sociologist of science, described values or norms that are endorsed by many scientists. One of these that is critical to ensuring that science is cumulative is the norm of communalism. Robert_K_Merton Communalism refers to the notion that scientific methodologies and results are not the property of individuals, but rather should be shared with the world.

Sharing allows others to know the details of a previous study, which is important for:

Understanding the study’s results
Building on its methodology
Confirming its findings

This last purpose is, arguably, essential. But across diverse sciences, ensuring that confirmation can be done, as well as actually doing it, has been neglected. This is the issue of reproduciblity.

Reproducibility

Another scientific norm important to achieving reproducibility was dubbed organized skepticism by Merton. The critical community provided by other researchers is thought to be key to the success of science. The Royal Society of London for Improving Natural Knowledge, more commonly known as simply the Royal Society, was founded in 1660 and established many of the practices we today associate with science worldwide. The Latin motto of the Royal Society, “Nullius in verba”, is often translated as “Take nobody’s word for it”.

Anyone can make a mistake, and most or all of us have biases, so scientific claims should be verifiable. The historian of science David Wooton has written that “What marks out modern science is not the conduct of experiments”, but rather “the formation of a critical community capable of assessing discoveries and replicating results.”

Types of reproducibility

Assessing discoveries and replicating results can involve two distinct types of activities. One can include examining the records associated with a study to check for errors. The second involves attempting to re-do the study and see whether similar data results that support the original claim.

The first type of activity is often referred to today with the phrase computational reproducibility. The word “computational” refers to taking the raw observations or data recorded for a study and re-doing the analysis that most directly supports the claims made.

The second activity is often referred to as replication. If a study is redone by collecting new data, does this replication study yield similar results to the original study? If very different results are obtained, this may call into question the claim of the original study, or indicate that the original study is not one that can be easily built upon.

Sometimes the word empirical is put in front of reproducibility or replication to make it clear that new data are being collected, rather than referring to computational reproducibility.

The replication crisis

The importance of reproducibility, in principle, has been recognized as critical throughout the history of science. In practice, however, many sciences have failed to adequately incentivize replication. This is one reason (later we will describe others) for the replication crisis.

The replication crisis refers to the discovery, and subsequent reckoning with, the poor success rates of efforts to computationally reproduce or empirically replicate previous studies.

The credibility revolution

The credibility revolution refers to the efforts by individual researchers, societies, scientific journals, and research funders to improve reproducibility. This has led to

New best practices for doing individual studies
Changes in how researchers and their funding applications are evaluated
Greater understanding of how to evaluate the credibility of the claims of individual studies

The word credibility refers to how believable a theory or claim is. This reflects both how plausible it is before one hears of any relevant evidence, plus the evidence for the theory. Thus if a claim is highly credible, the probability that it is true is high. The phrase credibility revolution helps convey that reforms related to reproducibility have boosted the credibility of many scientific theories and claims.

March 10, 2019

Just a list of our VSS presentations for 2019

Topics this year: Visual letter processing; role of attention shifts versus buffering (mostly @cludowici, @bradpwyble); reproducibility (@sharoz); visual working memory (mostly @will_ngiam)

Symposium contribution, 12pm Friday 17 May: Reading as a visual act: Recognition of visual letter symbols in the mind and brain

Implicit reading direction and limited-capacity letter identification

ebmocloH xelA, The University of Sydney
(abstract now has better wording)
I would like to congratulate you for reading this sentence. Somehow you dealt with a severe restriction on simultaneous identification of multiple objects – according to the influential “EZ reader” model of reading, humans can identify only one word at a time. Reading text apparently involves a highly stereotyped attentional routine with rapid identification of individual stimuli, or very small groups of stimuli, from left to right. My collaborators and I have found evidence that this reading routine is elicited when just two widely-spaced letters are briefly presented and observers are asked to identify both letters. A large left-side performance advantage manifests, one that is absent or reversed when the two letters are rotated to face to the left instead of to the right. Additional findings from RSVP (rapid serial visual presentation) lead us to suggest that both letters are attentional selected simultaneously, with the bottleneck at which one letter is prioritized sited at a late stage – likely at an identification or working memory consolidation process. Thus, a rather minimal cue of letter orientation elicits a strong reading direction-based prioritization routine. Our ongoing work seeks to exploit this to gain additional insights into the nature of the bottleneck in visual identification and how reading overcomes it.

Is there a reproducibility crisis around here? Maybe not, but we still need to change.

Alex O Holcombe¹, Charles Ludowici¹, Steve Haroz²

Poster 2:45pm Sat 18 May

¹School of Psychology, The University of Sydney

²Inria, Saclay, France

Those of us who study large effects may believe ourselves to be unaffected by the reproducibility problems that plague other areas. However, we will argue that initiatives to address the reproducibility crisis, such as preregistration and data sharing, are worth adopting even under optimistic scenarios of high rates of replication success. We searched the text of articles published in the Journal of Vision from January through October of 2018 for URLs (our code is here: https://osf.io/cv6ed/) and examined them for raw data, experiment code, analysis code, and preregistrations. We also reviewed the articles’ supplemental material. Of the 165 articles, approximately 12% provide raw data, 4% provide experiment code, and 5% provide analysis code. Only one article contained a preregistration. When feasible, preregistration is important because p-values are not interpretable unless the number of comparisons performed is known, and selective reporting appears to be common across fields. In the absence of preregistration, then, and in the context of the low rates of successful replication found across multiple fields, many claims in vision science are shrouded by uncertain credence. Sharing de-identified data, experiment code, and data analysis code not only increases credibility and ameliorates the negative impact of errors, it also accelerates science. Open practices allow researchers to build on others’ work more quickly and with more confidence. Given our results and the broader context of concern by funders, evident in the recent NSF statement that “transparency is a necessary condition when designing scientifically valid research” and “pre-registration… can help ensure the integrity and transparency of the proposed research”, there is much to discuss.

Talk saturday 18 May 2.30pm

A delay in sampling information from temporally autocorrelated visual stimuli

Chloe Callahan-Flintoft¹, Alex O Holcombe², Brad Wyble¹
¹Pennsylvania State University
²University of Sydney

Understanding when the attentional system samples from continuously changing input is important for understanding how we build an internal representation of our surroundings. Previous work looking at the latency of information extraction has found conflicting results. In paradigms where features such as color change continuously and smoothly, the color selected in response to a cue can be as long as 400 ms after the cue (Sheth, Nijhawan, & Shimojo, 2000). Conversely, when discrete stimuli such as letters are presented sequentially at the same location, researchers find selection latencies under 25 ms (Goodbourn & Holcombe, 2015). The current work proposes an “attentional drag” theory to account for this discrepancy. This theory, which has been implemented as a computational model, proposes that when attention is deployed in response to a cue, smoothly changing features temporally extend attentional engagement at that location whereas a sudden change causes rapid disengagement. The prolonged duration of attentional engagement in the smooth condition yields longer latencies in selecting feature information.
In three experiments participants monitored two changing color disks (changing smoothly or pseudo-randomly). A cue (white circle) flashed around one of the disks. The disks continued to change color for another 800 ms. Participants reported the disk’s perceived color at the time of the cue using a continuous scale. Experiment 1 found that when the color changed smoothly there was a larger selection latency than when the disk’s color changed randomly (112 vs. 2 ms). Experiment 2 found this lag increased with an increase in smoothness (133 vs. 165 ms). Finally, Experiment 3 found that this later selection latency is seen when the color changes smoothly after the cue but not when the smoothness occurs only before the cue, which is consistent with our theory.

Poster 2pm 20 May

Examining the effects of memory compression with the contralateral delay activity

William X Ngiam^1,2, Edward Awh², Alex O Holcombe¹
¹School of Psychology, University of Sydney
²Department of Psychology, University of Chicago

While visual working memory (VWM) is limited in the amount of information that it can maintain, it has been found that observers can overcome the usual limit using associative learning. For example, Brady et al. (2009) found that observers showed improved recall of colors that were consistently paired together during the experiment. One interpretation of this finding is that statistical regularities enable subjects to store a larger number of individuated colors in VWM. Alternatively, it is also possible that performance in the VWM task was improved via the recruitment of LTM representations of well-learned color pairs. In the present work, we examine the impact of statistical regularities on contralateral delay activity (CDA) that past work has shown to index the number of individuated representations in VWM. Participants were given a bilateral color recall task with a set size of either two or four. Participants also completed blocks with a set size of four where they were informed that colors would be presented in pairs and shown which pairs would appear throughout, to encourage chunking of the pairs. We find this explicit encouragement of chunking improved memory recall but that the amplitude of the CDA was similar to the unpaired condition. Xie and Zhang (2017; 2018) previously found evidence that familiarity produces a faster rate of encoding as indexed by the CDA at an early time window, but no difference at a late time window. Using the same analyses on the present data, we instead find no differences in the early CDA, but differences in the late CDA. This result raises interesting questions about the interaction between the retrieval of LTM representations and what the CDA is indexing.

Poster Tues 21 May 245pm

Selection from concurrent RSVP streams: attention shift or buffer read-out?

Charles J H Ludowici, Alex O. Holcombe
School of Psychology, The University of Sydney, Australia

Selection from a stream of visual information can be elicited via the appearance of a cue. Cues are thought to trigger a time-consuming deployment of attention that results in selection for report of an object from the stream. However, recent work using rapid serial visual presentation (RSVP) of letters finds reports of letters just before the cue at a higher rate than is explainable by guessing. This suggests the presence of a brief memory store that persists rather than being overwritten by the next stimulus. Here, we report experiments investigating the use of this buffer and its capacity. We manipulated the number of RSVP streams from 2 to 18, cued one at a random time, and used model-based analyses to detect the presence of attention shifts or buffered responses. The rate of guessing does not seem to change with the number of streams. There are, however, changes in the timing of selection. With more streams, the stimuli reported are later and less variable in time, decreasing the proportion reported from before the cue. With two streams – the smallest number of streams tested – about a quarter of non-guess responses come from before the cue. This proportion drops to 5% in the 18 streams condition. We conclude that it is unlikely that participants are using the buffer when there are many streams, because of the low proportion of non-guesses from before the cue. Instead, participants must rely on attention shifts.

February 11, 2019

survey of vision researchers: 2016 results on open access

Salvaged from an early Feb 2016 Google+ posting:

A vision researcher discussion happened on a semi-private email group (CVnet), but you can see some discussion on the visionlist archive (by moving around here: http://visionscience.com/pipermail/visionlist/2016/009312.html
), and below you can see the results of the survey.

Dear vision researchers,

A while ago I circulated a survey about open access and publishing, one that was oriented largely towards the issues raised in the initial CVnet emails. The survey was only open for a few weeks, but 380 of you responded.

Here are the raw data: https://docs.google.com/spreadsheets/d/1tfpSVeLflOG4moGvhHlT2SivnW5Rqw-upGrwLZkqEcA/edit?usp=sharing and here is an automatic Google-generated summary: https://docs.google.com/forms/d/1vhKwMkTCpm3DZGq2SGmd8_cNXXBv344Lo8XWtyDQXho/viewanalytics

I don’t want to be seen as biasing interpretation of the survey, but it seems safe to say that the large number of responses, and the data, show that many of us have opinions about these issues and want something done. The first question was “Which financial/organizational aspect of journals should be the community’s top priority?” and of the six options provided, the most popular answer was

“open access”, with 132 responses
“Full academic or professional society control” was 2nd with 78 responses
“Low cost” was 3rd, with 61 responses

To “What should the vision community do NOW?”, 1st was
“Make a change (choosing this will lead to some possible options)” with 353 votes
“Nothing, carry on as normal” was the other option and received 24 votes.

Those 353 pro-change respondents were shown multiple options for change, and could choose more than one. There was a strong vote for several, with the leading ones being
“Encourage the VR Editorial Board to jump ship” with 164 votes and
“Encourage the JoV Editorial Board to jump ship” with 160 votes.
Note there was also significant support for the MDPI Vision journal (137 votes) and
“Redirect our submissions and support to i-Perception” (106 votes).

To “What should the academics on the editorial boards of overpriced journals (be they subscription or open access) do?”,
“Work with the publisher to reform the journal itself” had 214 votes, followed by
“Wait until a majority or supermajority of editors agree to resign, and then resign en masse, with a plan agreed among the editors to join or start something else” with 90 votes

There was one other question, about desired features of journals; please go to the data to check out the options and responses https://docs.google.com/forms/d/1vhKwMkTCpm3DZGq2SGmd8_cNXXBv344Lo8XWtyDQXho/viewanalytics

Given the large number of responses and the overwhelming vote for “make a change” (93%), I hope that the editors of our journals will respond to this survey data and the related CVnet discussion, such as how authors without funds can publish in OA. Very likely, the editorial boards of journals have been discussing these issues behind the scenes for a few weeks, and it is understandable that reaching consensus on how a journal can respond will take time. As a result, editors’ responses are likely to occur at different times, resulting in a wandering discussion that will exhaust many of us and might focus criticism or praise overmuch on an individual journal.

So that our discussion is less piecemeal, the CVnet moderator, Hoover Chan, has agreed that if editors send their responses directly to him, he will collate the responses and send them out as a batch on 21 February (3 weeks from now).

Most of the discussion so far has centred on JoV and Vis Res, but there are other vision journals, such as Perception/iPerception (which it was nice to hear from just now), AP&P; Frontiers Perception Science; MDPI Vision; Multisensory Research and JEP:HPP; it would be good if we could have responses from all of them.

Perhaps the most salient question raised both by the survey responses and the CVnet discussion is exactly why each journal is as expensive/cheap as it is, particularly its open access option, and whether each journal will provide transparent accounting of costs. Given that the data indicate that “Full academic or professional society control” is a high priority, editors should also comment on the ability of themselves and the rest of us to affect their journal’s policies, features and cost.

February 11, 2019

Psychonomics Society and Perception/i-Perception on open access

A post from 27 Feb 2016 salvaged from Google+:

A discussion about the high author fees charged by some #openaccess journals brought up many other issues, some of which were included in a survey. Nearly 400 vision researchers responded to the survey, of which 93% expressed desire for change. See the detailed response breakdown here: https://plus.google.com/u/0/+AlexHolcombe/posts/71QRT2grZKt .

When I reported these survey results to the community mailing list (CVnet), I invited journal editors and publisher representatives to respond, and that their responses would be sent out after 3 weeks. Here are their responses:

From: Cathleen Moore (cathleen-moore@uiowa.edu)

I am writing on behalf of the Psychonomics Society in regard to the recent journal survey results that have been distributed throughout our community. We would like to offer the following statement as the outcome of discussions within the Executive Committee and the Publications Committee. We would be grateful if you would include in this your communications to the community regarding the recent survey results and surrounding discussion.

The mission of the Psychonomic Society is “…the communication of scientific research in psychology and allied sciences.”
(http://www.psychonomic.org/about-us). That is, communication of the science is the very purpose for our existence. As such, the Psychonomic
Society is committed to making membership in the society, the annual meeting, and all of our journals affordable for all. Open-access publishing is one aspect of the Society’s commitment, as evident in the establishment of our new open-access journal Cognitive Research: Processes and Implications.

Discussions about open access and other models of publishing are ongoing, and will be part of the formal agenda at future meetings of the Governing
Board later this year.

Sincerely,

Cathleen Moore
Chair, Governing Board

In consultation with:
Aaron Benjamin, Chair Elect
Bob Logie, Past Chair
Fernanda Ferreira, Publications Committee Chair
——————————————————-
From: Dennis Levi (dlevi@berkeley.edu)

The topic of open access will be a major discussion issue for the JOV [Journal of Vision] board meeting at VSS in May.
————————————————————————–
From: Timothy Meese (t.s.meese@aston.ac.uk)

Dear Vision Scientists

We at i-Perception and SAGE are pleased to respond to the issues raised in the recent discussion of open access on CVNet. We circulated a general response over CVNet shortly before Alex Holcombe circulated the results of his survey and the invitation to respond to those. We have appended our earlier circular to this email for completeness and for contact details.

** SURVEY RESPONSE **

OPEN ACCESS (OA)
i-Perception (iP) is a fully open access journal with papers published under a CC-BY license. As the survey was about OA, most of our response relates to iP. However, iP’s sister print journal Perception (P) includes some material that is also open access. We list that here for completeness: Editorials, the Perception lecture (from ECVP), some conference abstracts, and some of the back archive. The journal Perception can be accessed here: http://pec.sagepub.com

COSTS
Costs at iP are clearly competitive (375 GBP [~ 568 USD] for regular articles, see below for further details.) We can confirm that these costs will be fixed through 2016. They will be reviewed in 2017 to ensure ongoing viability for all stakeholders.

JOURNAL REFORM AND FULL ACADEMIC CONTROL
Regarding academic control, iP Chief Editors meet with SAGE three times a year, and there is also an annual Editorial/Advisory Board meeting at ECVP, with a representative from SAGE. It is our impression, confirmed during our board meetings at ECVP, that iP is not viewed as overpriced and that reform on this matter is not being sought at this time. However, we add that the Chief Editors will do what they can to keep costs down. We would also like to point out that the Chief Editors are always open to suggestions (by email or in person), which can be taken forward to management board meetings for further discussion. Although subject to certain constraints (e.g. the limitations of generic company software packages such as ScholarOne), we have found SAGE to be very accommodating to our requests and suggestion thus far.

TRANSPARENT ACCOUNTING
The sum that an author pays for publication has two components: 1. Internal production costs (non profit). 2. Profit. For large organisations, isolating item 1 is quite tricky—for example, should this be averaged over all the publisher’s journals or just the relevant journal? As different journals adopt different approaches, comparisons of components 1 and 2 are likely to be problematic. However, the TOTAL cost that an author pays (page charges, OA/CC-BY charges, any other charges and fees) allows for unambiguous comparisons.

OPEN REVIEW AND POST-PUBLICATION PEER REVIEW
At present SAGE do not do this for any of their journals. There is no immediate plan to do so, but SAGE are keeping their eye on the situation for open review. As for post-peer review, we do have a section in iP called ‘Journal Club’ which is intended for published discussion of other people’s publications. This, we believe, is the best way to implement relevant post-publication peer review.

REGISTER REPORT FORMAT
This allows authors to register the format of their study before data are gathered. This can be valuable in justifying the chosen statistical analysis and also for reporting null results. This is something that several SAGE journals do and will be a subject for discussion between SAGE and the chief editors of iP at their next managerial board meeting in June.

OPENNESS BADGES (COS) https://osf.io/tvyxz/wiki/home/
This was first raised at our Editorial/Advisory Board meeting held at ECVP in 2013 and then discussed in detail by that board the following year after circulating a detailed paper on the matter. While the value of these badges was acknowledged for other journals and disciplines, their value for vision/perception research was viewed as questionable and there was little or no enthusiasm at the meeting for adopting the COS badges at that time. However, that decision is open to review, particularly in light of the item above.

OPEN JOURNAL TIME FRAMES
At present, SAGE do not report submission and acceptance dates for articles in iP, but this is something that will change in the near future. We are also looking into whether it is possible to make average review times for the journal available on the website.

COPE MEMBERSHIP
COPE membership for Perception and i-Perception is currently being processed and we expect to be able to acknowledge this on the website very soon.

COPYEDITING
SAGE provide copyediting and typesetting. Authors see the copyedited and typeset proofs for any final corrections before publication.

LATEX
We will be able to accept LaTeX submissions very soon.

ALERTS
To register with iP and/or sign up to receive an email alert for each new issue go here: http://ipe.sagepub.com/cgi/alerts

Signed
Chief editors of Perception and iPerception:
Tim Meese
Peter Thompson
Frans Verstraten
Johan Wagemans
SAGE:
Ellie Craven

________________________________________________
APPENDIX A (The email below was first circulated over CVNet on 1st February 2016)

There is a new OA journal already…

…It is i-Perception.

WHAT IT IS
i-Perception (or iPerception) was founded in 2010, and is the OA, peer-reviewed, online sister journal to the long-running print journal, Perception, founded by Richard Gregory in 1972.

BACKGROUND
For many years both journals were owned by the UK publisher Pion but have recently been taken on by SAGE. As editors, we have enjoyed positive relations with both publishers regarding all aspects of the journal. Although the shift to SAGE has meant the loss of the much beloved submission system, PiMMS (we now use ScholarOne) and ‘paper icons’ on the contents page of iPerception, we are now enjoying the benefits of efficiency and outreach that comes with a larger publisher, and one that we have found to be sensitive to the needs and views of the journals’ editors and authors.

Perception and iPerception (we often abbreviate the two journals to PiP) are journals run by vision/perceptual/sensory scientists for vision/perceptual/sensory scientists. For example, PiP have a long standing history in supporting major vision conferences (APCV, ECVP, VSS), but particularly ECVP, where they are the chosen outlet for published conference abstracts and sponsors of the keynote ‘Perception’ lecture on the opening evening.

REMIT
The remit of both journals is the same: any aspect of perception (human, animal or machine), with an emphasis on experimental work; but theoretical positions, reviews and comments are also considered for publication (see website below for details of the various paper categories). Although the majority of the papers published are on visual perception, all other aspects of perception are also covered, including multi-modal studies, giving it a broader remit than either VR or JoV. PiP is sometimes thought to focus on phenomenology (owing to the interests of its founding editor, we think), but hardcore psychophysics is also found within its pages, and much of what is published in VR or JoV would not be out of place in PiP.

EDITORIAL BOARD
Although the two journals are independent (e.g. they have their own impact factors; the IF for iP is 1.482), they are overseen by a common international editorial board who can be found by following the third link at the bottom of this page. An editorial board meeting takes place annually at ECVP.
The four chief editors (based in Europe/Australia, see below) and the administrative manager (Gillian Porter, based in Bristol, UK) hold managerial board meetings three times a year with SAGE (based in London, UK) and enjoy a close working relationship with an open door (email) policy.

COPYRIGHT AND OPEN ACCESS
Papers in iP are published under a CC-BY license (https://en.wikipedia.org/wiki/Creative_Commons_license) (this is Gold OA). Papers in PiP are also branded Romeo Green (http://www.sherpa.ac.uk/romeoinfo.html). (This is a different branding system from the more familiar Gold/Green OA terminology, and the two should not be confused.) Romeo Green status enables authors to archive their accepted version in their institutional repository, their departmental website, and their own personal website immediately upon acceptance. This is the most open publishing policy possible, of which SAGE (and we) are justly proud.
If your library does not stock Perception, you might think to request that they do—the bundles with which it is included are likely to have changed with the change in publisher (to SAGE).

COSTS
There is no cost to publishing in Perception.
The cost for publishing in iPerception is a single charge (on acceptance of the paper), depending on paper type as follows:
Regular Articles = 375 GBP (~ 568 USD)
Short Reports = 200 GBP (~ 303 USD)
Translation Articles = 200 GBP (~ 303 USD)
Short and Sweet = 150 GBP (~227 USD)
Journal Club = 150 GBP (~227 USD)
iReviews = no charge.

VAT (value added tax) at 20% is added to the costs above if the paying party is in the European Union, to comply with European Law. Non-UK institutions are exempt from VAT if they can provide a VAT registration number.

THE FUTURE
We have been watching the debate on OA over CVNet with interest; we agree that a low cost OA option is a desirable forum for our community—preferably one based around a dedicated journal so as to provide a sense of ‘home’ rather than an unfocussed (rebellious, even) out camp—; we hope you will join us to help bring iP towards the forefront of that endeavour.

JOURNAL LINKS

To see content of iPerception follow the link below…
http://ipe.sagepub.com/

To submit to Perception or iPerception follow the link below to ScholarOne…
https://mc.manuscriptcentral.com/i-perception

To see details about iPerception follow the link below…
https://uk.sagepub.com/en-gb/eur/i-perception/journal202441

Signed (Chief editors of Perception and iPerception)
Tim Meese
Peter Thompson
Frans Verstraten
Johan Wagemans

Several outside organizations associated with journals have asked about this.…

Alex Holcombe's blog

open science, open access, meta-science, perception, neuroscience, …