Registered Replication Reports are open for submissions!

Science is broken; let’s fix it. This has been my mantra for some years now, and today we are launching an initiative aimed squarely at one of science’s biggest problems. The problem is called publication bias or the file-drawer problem and it’s resulted in what some have called a replicability crisis.

When researchers do a study and get negative or inconclusive results, those results usually end up in file drawers rather than published. When this is true for studies attempting to replicate already-published findings, we end up with a replicability crisis where people don’t know which published findings can be trusted.

To address the problem, Dan Simons and I are introducing a new article format at the journal Perspectives on Psychological Science (PoPS). The new article format is called Registered Replication Reports (RRR).  The process will begin with a psychological scientist interested in replicating an already-published finding. They will explain to we editors why they think replicating the study would be worthwhile (perhaps it has been widely influential but had few or no published replications). If we agree with them, they will be invited to submit a methods section and analysis plan and submit it to we editors. The submission will be sent to reviewers, preferably the authors of the original article that was proposed to be replicated. These reviewers will be asked to help the replicating authors ensure their method is nearly identical to the original study.  The submission will at that point be accepted or rejected, and the authors will be told to report back when the data comes in.  The methods will also be made public and other laboratories will be invited to join the replication attempt.  All the results will be posted in the end, with a meta-analytic estimate of the effect size combining all the data sets (including the original study’s data if it is available). The Open Science Framework website will be used to post some of this. The press release is here, and the details can be found at the PoPS website.

Professor Daniel J. Simons (University of Illlinois) and I are co-editors for the RRRs.  The chief editor of Perspectives on Psychological Science is Barbara A. Spellman (University of Virginia), and leadership and staff at the Association for Psychological Science, especially Eric Eich and Aime Ballard, have also played an important role (see their press release).

Three features make RRRs very different from the usual way that science gets published:

1. Preregistration of replication study design and analysis plan and statistics to be conducted BEFORE the data is collected.

  • Normally researchers have a disincentive to do replication studies because they usually are difficult to publish. Here we circumvent the usual obstacles to replications by giving researchers a guarantee (provided they meet the conditions agreed during the review process) that their replication will be published, before they do the study.
  • There will be no experimenter degrees of freedom to analyse the data in multiple ways until a significant but likely spurious result is found. This is particularly important for  complex designs or multiple outcome variables, where those degrees of freedom allow one to always achieve a significant result. Not here.

2. Study is sent for review to the original author on the basis of the plan, BEFORE the data come in.

  • Unlike standard replication attempts where the author of the published, replicated study sees it only after the results come in, we will catch the replicated author at an early stage. Many will provide constructive feedback to help perfect the planned protocol so it has the best chance of replicating the already-published target effect.

3. The results will not be presented as a “successful replication” or “failed replication”. Rarely is any one data set definitive by itself, so we will concentrate on making a cumulative estimate of the relevant effect’s size, together with a confidence interval or credibility interval.

  • This will encourage people to make more quantitative theories aimed at predicting a certain effect size, rather than only worrying about whether the null hypothesis can be rejected (as we know, the null hypothesis is almost never true, so can almost always be rejected if one gets enough data).

This initiative is the latest in a long journey for me. Ten years ago, thinking that allowing the posting of comments on published papers would result in flaws and missed connections to come to light much earlier, David Eagleman and I published a letter to that effect in Nature and campaigned (unsuccessfully) for commenting to be allowed on PubMed abstracts.

Since then, we’ve seen that even where comments are allowed, few scientists make them, probably because there is little incentive to do so and doing it would risk antagonising their colleagues. In 2007 I became an academic editor and advisory board member for  PLoS ONE, which poses fewer obstacles to publishing replication studies than do most journals. I’m lucky to have gone along on the ride as PLoS ONE rapidly became the largest journal in the world (I resigned my positions at PLoS ONE to make time for the gig at PoPS). But despite the general success of PLoS ONE, replication studies were still few and far between.

In 2011, Hal Pashler, Bobbie Spellman, Sean Kang and I started PsychFileDrawer, a website for researchers to post notices about replication studies. This has enjoyed some success, but it seems without the carrot of a published journal article, few researchers will upload results, or perhaps even conduct replication studies.

Finally with this Perspectives on Psychological Science initiative, a number of things have come together to overcome the main obstacles to publication studies: fear of antagonising other researchers and the uphill battle required to get the study published. Some other worthy efforts to encourage replication studies are happening at Cortex and BMC Psychology.

If you’re interested in proposing to conduct a replication study for eventual publication, check out the instructions and then drop us a line at replicationseditor @ psychologicalscience.org!

Protect yourself during the replicability crisis of science

Scientists of all sorts increasingly recognize the existence of systemic problems in science, and that as a consequence of these problems we cannot trust the results we read in journal articles. One of the biggest problems is the file-drawer problem. Indeed, it is mostly as a consequence of the file-drawer problem that in many areas most published findings are false.

Consider cancer preclinical bench research, just as an example. The head of Amgen cancer research tried to replicate 53 landmark papers. He could not replicate 47 of the 53 findings.

In experimental psychology, a rash of articles has pointed out the causes of false findings, and a replication project that will dwarf Amgen’s is well underway. The drumbeat of bad news will only get louder.

What will be the consequences for you as an individual scientist? Field-wide reforms will certainly come, partly because of changes in journal and grant funder policies. Some of these reforms will be effective, but they will not arrive fast enough to halt the continued decline of the reputation of many areas.

In the interim, more and more results will be viewed with suspicion. This will affect individual scientists directly, including those without sin. There will be:

  • increased suspicion by reviewers and editors of results in submitted manuscripts (“Given the history of results in this area, shouldn’t we require an additional experiment?“)
  • lower evaluation of job applicants for faculty and postdoctoral positions (“I’ve just seen too many unreliable findings in that area“)
  • lower scores for grant applications (“I don’t think they should be building on that paper without more pilot data replicating it“)

These effects will be unevenly distributed. They will often manifest as exaggerations of existing biases. If a senior scientist already had a dim view of social psychology, for example, then the continuing replicability crisis will likely magnify his bias, whereas his view of other fields that the scientist “trusts” will not be as affected by the whiff of scandal, at least for awhile- people have a way of making excuses for themselves and their friends.

But there are some things you can do to protect yourself. These practices will eventually become widespread. But get a head start, and look good by comparison.

  • Preregister your study hypotheses, methods, and analysis plan. If you go on record with your plan before you do the study, this will allay the suspicion that your result is not robust, that you fished around with techniques and statistics until you got a statistically significant result. Journals will increasingly endorse a policy of favoring submitted manuscripts that have preregistered their plan in this way. Although websites set up to take these plans may not yet be available in your field, they are coming, and in the meantime you can post something on your own website, on FigShare perhaps, or in your university publicly accessible e-repository.
  • Post your raw data (where ethically possible), experiment code, and analysis code to the web. This says you’ve got nothing to hide. No dodgy analyses, and you welcome the contributions of others to improve your statistical practices.
  • Post all pilot data, interim results, and everything you do to the web, as the data come in. This is the ultimate in open science. You can link to your “electronic laboratory notebooks” in your grants and papers. Your reviewers will have no excuse to harbor dark thoughts about how your results came about, when they can go through the whole record.

The proponents of open science are sometimes accused of being naifs who don’t understand that secretive practices are necessary to avoid being scooped, or that sweeping inconvenient results under the rug is what you got to get your results into those high impact-factor journals. But the lay of the land has begun to change.

Make way for the cynics! We are about to see people practice open science not out of idealism, but rather out of self interest, as a defensive measure. All to the better of science.

VSS 2012 abstracts, and Open satellite

Below are research presentations I’m involved in for Vision Sciences Society in May. If you’re attending VSS, don’t forget about the Publishing, Open Access, and Open Science satellite which will be Friday at 11am. Let us know your opinion on the issues and what should be discussed here

Splitting attention slows attention: poor temporal resolution in multiple object tracking

Alex O. Holcombe, Wei-Ying Chen

Session Name: Attention: Tracking (Talk session)

Session Date and Time: Sunday, May 13, 2012, 10:45 am – 12:30 pm

Location: Royal Ballroom 4-5

When attention is split into foci at disparate locations, the minimum size of the selection focus at each location is larger than if only one location is targeted (Franconeri, Alvarez, & Enns, 2007)- splitting attention reduces its spatial resolution. Here we tested temporal resolution and speed limits. STIMULUS. Three concentric circular arrays (separated by large distances to avoid spatial interactions between them) of identical discs were centered on fixation. Up to three discs (one from each ring) were designated as targets. The discs orbited fixation at a constant speed, occasionally reversing direction. After the discs stopped, participants were prompted to report the location of one of the targets. DESIGN. Across trials, the speed of the discs and the number in each array was varied, which jointly determined the temporal frequency. For instance, with 9 objects in the array, a speed of 1.1 rps would be 9.9 Hz. RESULTS. With only one target, tracking was not possible above about 9 Hz, far below the limits for perceiving the direction of the motion, and consistent with Verstraten, Cavanagh, & LaBianca (2000).  The data additionally suggest a speed limit, with tracking impossible above 1.8 rps, even when temporal frequency was relatively low. Tracking two targets could only be done at lower speeds (1.4 rps) and lower temporal frequencies (6 Hz). This decrease is approximately that predicted if at high speeds and high temporal frequencies, only a single target could be tracked. Tracking three yielded still lower limits. Little impairment was seen at very slow speeds, suggesting these results were not caused by a reduction in spatial resolution. CONCLUSION.  Splitting attention reduces the speed limits and the temporal frequency limits on tracking. We suggest a parallel processing resource is split among targets, with less resource on a target yielding poorer spatial and temporal precision and slower maximum speed.

A hemisphere-specific attentional resource supports tracking only one fast-moving object.

Wei-Ying Chen & Alex O. Holcombe

Session Name: Attention: Tracking (Talk session)

Session Date and Time: Sunday, May 13, 2012, 10:45 am – 12:30 pm

Location: Royal Ballroom 4-5

Playing a team sport or taking children to the beach involves tracking multiple moving targets. Resource theory asserts that a limited resource is divided among targets, and performance reflects the amount available per target. Holcombe and Chen (2011) validated this with evidence that tracking a fast-moving target depletes the resource. Using slow speeds Alvarez and Cavanagh (2005) found the resource consumed by additional targets is hemisphere-specific. They didn’t test the effect of speed, and here we tested whether speed also depletes a hemisphere-specific resource. To put any speed limit cost in perspective, we modeled a “total depletion” scenario- the speed limit cost if at high speeds one could not track the additional target at all and had to guess one target. Experiment 1 found that the speed limit for tracking two targets in one hemifield was similar to that predicted by total depletion, suggesting that the resource was totally depleted. If the second target was instead placed in the opposite hemifield, little decrement in speed limit occurred. Experiment 2 extended this comparison to tracking two vs. four targets. Compared to the speed limit for tracking two targets in a single hemifield, adding two more targets in the opposite hemifield left the speed limit largely unchanged. However starting with one target in both the left and right hemifields, adding another to each hemifield had a severe cost similar to that of the total depletion model. Both experiments support the theory that an object moving very fast exhausts a hemisphere-specific attentional tracking resource.

Attending to one green item while ignoring another: Costly, but with curious effects of stimulus arrangement

Shih-Yu Lo & Alex O. Holcombe

Session Name: Attention: Features I (Poster session)

Session Date and Time: Monday, May 14, 2012, 8:15 am – 12:15 pm

Location: Vista Ballroom

Splitting attention between targets of different colors is not costly by itself. As we found previously, however, monitoring a target of a particular color makes one more vulnerable to interference by distracters that share the target color. Participants monitored the changing spatial frequencies of two targets of either the same (e.g., red and red) or different colors (e.g., red and green). The changing stimuli disappeared without warning and participants reported the final spatial frequency of one of the targets. In the different-colors condition, a large cost occurs if a green distracter is superposed on the red target in the first location and a red distracter is superposed on the green target in the second location. This likely reflects a difficulty with attending to a color in one location while ignoring it in another. Here we focus on a subsidiary finding regarding perceptual lags. Participants reported spatial frequency values from the past rather than the correct final value, and such lags were greater in the different-colors condition. This “perceptual lag” cost was found when the two stimuli were horizontally arrayed but not, curiously, when they were vertically arrayed. Arrangement was confounded however with processing by separate brain hemispheres (opposite hemifields). In our new study, we unconfounded arrangement and presentation in separate hemifields with a diagonal condition- targets were not horizontally arrayed but were still presented to different hemifields. No significant different-colors lag cost was found in this diagonal arrangement (5 ms) or in the vertical arrangement (86 ms), but the cost (167 ms) was significant in the horizontal arrangement, as in previous experiments. Horizontal arrangement apparently has a special effect apart from the targets being processed by different hemispheres. To speculate, this may reflect sensitivity to bilateral symmetry and its violation when the target colors are different.

Dysmetric saccades to targets moving in predictable but nonlinear trajectories

Reza Azadi, Alex Holcombe, and Jay Edelman

Poster

A saccadic eye movement to a moving object requires taking both the object’s position and velocity into account. While recent studies have demonstrated that saccades can do this quite well for linear trajectories, its ability to do so for stimuli moving in more complex, yet predictable, trajectories is unknown. With objects moving in circular trajectories, we document failures of saccades not only to compensate for target motion, but even to saccade successfully to any location on the object trajectory. While maintaining central fixation, subjects viewed a target moving in a circular trajectory at an eccentricity of 6, 9, or 12 deg for 1-2 sec. The stimulus orbited fixation at a rate of 0.375, 0.75, or 1.5 revolutions/sec. The disappearance of the central fixation point cued the saccade. Quite unexpectedly, the circularly moving stimuli substantially compromised saccade generation. Compared with saccades to non-moving targets, saccades to circularly moving targets at all eccentricities had substantially lower amplitude gains, greater curvature, and longer reaction times. Gains decreased by 20% at 0.375 cycles/sec and more than 50% at 1.5 cycles/sec. Reaction times increased by over 100ms for 1.5 cycles/sec. In contrast, the relationship between peak velocity and amplitude was unchanged. Given the delayed nature of the saccade task, the system ought to have sufficient time to program a reasonable voluntary saccade to some particular location on the trajectory. But, the abnormal gain, curvature, and increased reaction time indicate that something else is going on. The successive visual transients along the target trajectory perhaps engage elements of the reflexive system continually, possibly engaging vector averaging processes and preventing anticipation. These results indicate that motor output can be inextricably bound to sensory input even during a highly voluntary motor act, and thus suggest that current understanding of reflexive vs. voluntary saccades is incomplete.

PsychFileDrawer blog, commenting on Association for Research in Personality newsletter

I’ve started blogging at PsychFileDrawer.

One of our first posts is addressed to the Association for Research in Personality newsletter:

Regarding your article entitled “Personality Psychology Has a Serious Problem (And so Do Many Other Areas of Psychology)”,

We agree wholeheartedly with your diagnosis of a major problem in publication practices in psychology. As you explain, any solution has to include a reduction in the systematic bias against publishing non-replications that now exists. Such a bias seems to be present in the editorial practices of all of the major psychology journals.  In addition, discussions with colleagues lead us to believe that investigators themselves tend to lose interest in a phenomenon when they fail to replicate a result, partly because they know that publishing negative findings is likely to be difficult and writing the manuscript time-consuming.  Given these biases, it seems inevitable that our literature and even our textbooks are filling with fascinating “findings” that lack validity.  Read the rest at the PsychFileDrawer blog.

Any ideas for enticing people contribute replication attempts to PsychFileDrawer will be gratefully received!

Top 15 most popular laws in psychology journal abstracts

How many of these laws do you know? The top 15, listed below, are based on psychology journal articles 1900-1999, as calculated by Teigen (2002):

LAW  (REFERENCE)   NUMBER OF MENTIONS
1. Weber’s law (Weber 1834)  336
2. Stevens’ power law (Stevens 1957)  241
3. Matching law (Herrnstein 1961)  183
4. Law of effect (Thorndike 1911)  177
5. Fechner’s law (Fechner 1860) 100
6. Fitts’ Law (Fitts 1954) 82
7. Law of initial values (Wilder 1957) 82
8. Law of comparative judgment (Thurstone 1927) 72
9. Yerkes-Dodson law (Yerkes & Dodson 1908) 52
10. All-or-none law (Bowditch 1871) 45
11. Emmert’s law (Emmert 1881) 43
12. Bloch’s law (Bloch 1885) 41
13. Gestalt laws (Wertheimer 1923) 41
14. Hick’s law (Hick 1952) 31
15. Listing’s law (Listing 1870) 29

Although it’s no longer in fashion in psychology to suggest that empirical generalizations are “laws”, I think the perception ones have held up fairly well. In perhaps every case exceptions have been found, but most of the laws are still useful as generalizations over a lot of empirical territory.

Many people are generally skeptical of psychology as a science, and their voices have grown louder thanks to recent cases of fraud and to articles such as “Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”, recently published in Psychological Science. So it’s nice to be reminded that psychological science has produced robust generalizations.

On the other hand, few question the validity of perception and psychophysics, which provides many of the laws above; the skeptics are thinking more of other areas, perhaps social psychology, clinical psychology, or developmental psychology. In those areas, effect sizes are smaller and data is harder to gather, so published results are more likely to be statistical flukes.

The “file drawer problem” is clearly one of the biggest reasons to mistrust psychological results, and I’d say it’s probably the biggest problem in all of science. The file drawer problem refers in part to the fact that when scientists can’t replicate a previously published effect, they are very likely to file the results away rather than try to publish them. So I’ve been helping create a website, psychfiledrawer.org (currently in beta), for people to report their failed replications.

Teigen, K. (2002). One Hundred Years of Laws in Psychology The American Journal of Psychology, 115 (1) DOI: 10.2307/1423676

Evidence Charts receives commendation

Our Evidence Charting project has received a commendation as a a Undergraduate Learning, Teaching, and Assessment Resource Commendation.

The commendation was received from the Australian Learning and Teaching Council (ALTC) in conjunction with the Teaching, Learning and Psychology Interest Group of the Australian Psychological Society (APS) and the Australian Psychology Educators Network (APEN).

EvidenceChart.com

Let me know if you’re interested in using evidence charts in your teaching.

For researchers, we’re using it as a research synthesis format, sort of like a very-concise review article. If you have a topic that you think it would work well for, drop me a line. We hope to publish some charts in a few scientific journals.

Color space pictured and animated (Derrington Krauskopf Lennie)

The Derrington, Krauskopf and Lennie (1984) color space is based on the Macleod-Boynton (1979) chromaticity diagram. Colors are represented in 3 dimensions using spherical coordinates that specify the elevation from the isoluminant plane, the azimuth (the hue) and the contrast (as a fraction of the maximal modulations along the cardinal axes of the space).

It’s easier for me to think of a color in cartesian DKL coordinates with the dimensions:

  • Luminance or L+M, sum of L and M cone response
  • L-M, difference of L and M cone response
  • S-(L+M), S cone responses minus sum of L and M cone response

The three classes of cones respond a bit to almost all colors, but some reds excite L cones the most, some greens M cones the most, and some blues S cones the most.

I’ve created the below movie (thanks Jon Peirce and PsychoPy) to show successive equiluminant slices of DKL color space, plotted with cartesian coordinates. These render correctly on my CRT screen, but the colors will be distorted on any other screen. Nevertheless it helps you get a feel for the gamut (colors that can be represented) of a typical CRT at each luminance, where -1 is the minimum luminance of the CRT and +1 is its maximum. The letters R,G,B and accompanying numbers show the coordinates of the phosphors (each gun turned on by itself).

Derrington AM, Krauskopf J, & Lennie P (1984). Chromatic mechanisms in lateral geniculate nucleus of macaque. The Journal of physiology, 357, 241-65 PMID: 6512691

MacLeod DI, & Boynton RM (1979). Chromaticity diagram showing cone excitation by stimuli of equal luminance. Journal of the Optical Society of America, 69 (8), 1183-6 PMID: 490231

Above attention’s speed limit, blindness for spatial relationships

We discovered that when an array of colored discs was spun so fast that attention could no longer keep up with it, people could no longer perceive which colors were adjacent.

Together with an additional attentional cueing experiment, this phenomenon suggests that a shift of attention is required to mentally link adjacent elements and apprehend their spatial relationship.

The experiments are described in:
Holcombe, A., Linares, D., & Vaziri-Pashkam, M. (2011). Perceiving Spatial Relations via Attentional Tracking and Shifting Current Biology DOI: 10.1016/j.cub.2011.05.031

The movie shows the main stimulus, but unfortunately it can’t be displayed fast enough on a webpage for you to lose the ability to apprehend the spatial relationships among the colors.

I have written a gentle introduction to perceptual speed limits such as this one.

An interactive turnkey tutorial to give students a feel for brain-style computation

A new version of my 100-minute interactive neural network lesson is available. The lesson webpages guide university-level students through learning and directed play with a connectionist simulator. The outcome is that students gain a sense of how neuron-like processing units can mediate adaptive behavior and memory.

Making the lesson was fairly straightforward, thanks to Simbrain, a connectionist simulator which is the easiest to use of any I’ve seen. After a student downloads the Java-based Simbrain software and double-clicks, she is on her way with menu-driven and point-and-click brain hacking.

New to Simbrain and my lessons this year are animated avatars that depict the movements of the virtual organism controlled by the student’s neural network. This feature, added to the software by Simbrain creator Jeff Yoshimi, provided a nice lift to student engagement compared to previous versions. The first lesson is mainly intended to bring students to understand the basic sum-and-activate functioning of a simplified neuron and how wiring such units together can accomplish various things. It’s couched in terms of a guy chasing girls scenario, so that most university students can relate. The second lesson gives students a rudimentary understanding of pattern learning and retrieval with a Hebbian learning rule.

Going through the lessons is certainly not as fun as playing a video game, but it is more interesting and interactive than a conventional university tutorial. In the long term, I hope we can make it more and more like a (educational) video game. Some say that’s the future of education, and whether or not that’s true in general, I think neural networks and computational neuroscience are ideally suited for it.

Development of the Simbrain codebase has been gathering steam, although that may not be apparent from the Simbrain website because portions of it haven’t been updated for awhile. Lately, a few people have jumped in to help with development.

Already the code is developed enough to provide all the functionality needed for school- and university-level teaching. Giving a series of classes with it could easily be done at this point. However, to do so you’d have to take time to develop associated lecture material and refine the example networks. If you don’t have time for that, you should consider simply using my lessons to give students a taste.

The lessons have been battle-tested on a few hundred psychology students at University of Sydney, without any significant problems. The tutors (aka demonstrators or teaching assistants) took the introductory slides provided and used them to help orient the students before they started following the web-based instructions for the lessons. Contact us if you want to take it further.

technical note: d-prime proportion correct in choice experiments (signal detection theory)

If you don’t understand the title of this post, you almost certainly will regret reading further.

We’re doing an experiment in which one target is presented along with m distracters. The participant tries to determine which is the target, and must respond with their best guess regarding which is it. Together the m distracters + 1 target = “number of alternatives”.

In the plots shown are the predictions from vanilla signal detection theory for the relationship between probability correct, d-prime, and number of alternatives. Each distracter is assumed to have a discriminability of d-prime from the target.

signal detection theory relationship among percent correct, d-prime, number of alternatives
The two plots are essentially the inverse of each other.

Note that many studies use two-interval forced choice wherein the basic stimulus containing distracters are presented twice, one with the signal and the participants has to choose which contained the signal. In contrast, here I’m showing predictions for an experiment wherein the target with all its distracters is only presented once, and the participant reports which location contained the target.

I should probably add a lapse rate to these models, and generate curves using a reasonable lapse rate like .01.

I’ll post the R code using ggplot that I made to generate these later; email me if I don’t or you want it now. UPDATE: the code, including a parameter for lapse rate.

reference: Hacker, M. J., & Ratcliff, R. (1979). A revised table for d’ for M-alternative forced choice, 26(2), 168-170.
#To determine the probability of target winning, A, use the law of total probability:
# p(A) = Sum (p(A|B)p(B)) over all B
# Here, B will be all possible target TTC estimates and p(A|B) will be probability distracters
# are all lower than that target TTC estimate, B

# x is TTC estimate of distracter
# Probability that distracter TTC estimate less than target is pnorm(x): area under curve
# less than x.
# m: number of objects, m-1 of which are distracters
# p(A|B)*p(B) = pnorm(x)^(m-1) * dnorm(x-dprime)
# Hacker & Ratcliff, 1979 and Eliot 1964 derive this as did I
# Jakel & Wichmann say that "numerous assumptions necessary for mAFC" where m>2 but not clear
# whether talking about bias only or also about d'