Yellow journalism and Manhattan murders

The headline screams “You’re 45% more likely to be murdered in de Blasio’s Manhattan”.

The evidence? Sixteen people have been killed so far this year in Manhattan, against only eleven over the same period last year.

Does this evidence indicate you are more likely to be murdered, as the headline says? To find out, I tested whether a constant murder rate could explain the results. The probability of getting murdered over the same period last year may be approximately 11/Manhattan’s population = 11/1,630,000 = 0.0000674 = .00674%.

Is it likely that with the same murder rate this period this year, one would get a number as high as 16 murders? Yes.

This can be seen by calculating the 95% confidence interval for 11/1,630,000, which according to 3 different statistical methods, spans 5 to 20. That is, even with a constant murder rate, due to statistical fluctuations, the murders over this period could easily have been as low as 5 or as high as 20.  Just like if one flips a coin 10 times, one may get 3 heads the first time and 6 the next, without the chance of a head changing.

Doing this more properly means comparing the two rates directly.  I did this using three different methods, all of which found no significant difference.

The article also reports that the number of shooting incidents is higher this year, 50 instead of 31. Using the three different statistical methods again, this was (barely) significantly different. So here the journalist has a point. But this should be taken with a big grain of salt. Journalists are always looking for “news”, and if they repeatedly look at how many people have been murdered/shot, eventually they are guaranteed to find an apparent difference, because all possible statistical fluctuations will happen eventually.

The statistics and the code are here.

I only did all this and wrote this post because Hal Pashler saw someone tweet the NYPost piece. Hal knew I had previously looked into the statistics of proportions and asked whether the headline was justified. I invite others to disagree with my calculations if they have a better way of doing it. I don’t think different methods will give a very different result, however.

Scholarly publisher profit update.

I made the below slide for a talk in 2012 to show that the biggest corporate scientific publishers are outrageously profitable.  But that was 3 years ago. How do they look now?

Screen Shot 2013-01-09 at 12.35.26 PM

Outdated figures, created in 2012

The 40% figure for Wiley in my original slide at left may have been “overinflated”, as a helpful someone explained on twitter. The issue is that the figure (which I got from Heather Morrison’s thesis) did not have costs subtracted from it (as Heather explained to me later), unlike had been done for the other companies to yield operating profit. Subtracting the costs yields a much lower profit of about 20%. This revised figure is however an underestimate in that it doesn’t include the profits sent to societies that many of the journals were published in partnership with. But because Wiley has not released detailed numbers since 2012, I have dropped them from this update.

In my updated table (below), Rio Tinto’s operating profit margin is at 23%, based on 11.3 billion in operating profit and 47.7 billion in consolidated sales revenue (p. 27 of their media release) in 2014.publisherProfits2015edition

The 10% operating profit figure for BMW reflects their 2013 revenue of 76B and operating income of 7.85B. I didn’t find more recent figures, but indications are their profit hasn’t changed much.

Google’s profit was 25% for 2014. Income from operations was 16.5B, with revenue of 66B, according to their 10-K.

Apple at 29% for the year ending 27 Sep 2014 reflects operating income of 52.5B on net sales of 182.2B from their 10-K.

Springer seems to have not released detailed numbers since 2012, but in 2012 they wrote that sales were 981.1m euros, with EBITDA of 342.8m for margin of 35%. EBITDA and operating profit differs in the inclusion of non-operating income, but I suspect that non-operating income was fairly negligible.

Elsevier’s journal publishing business is in its Scientific, Technical, & Medical division. That division reported, for 2014, an adjusted operating profit of 762£m on 2,048 £m in revenue, or 37%. Their 2013 margin was also 37%.

Because it’s difficult to know and allocate costs, these numbers should only be considered as ballpark estimates, and I’m unlikely to update the tables each time someone arrives at a different figure.

These Elsevier and Springer operations do more than publish journals. They also make money from a few other sources, such as databases of medical and legal information. Rather than their overall profit, we’d like to know how much they make from journal articles specifically. There’s no way to know this, because the companies don’t release this information. Elsevier even requires its universities to sign confidentiality agreements so that no one can know how much the price of the subscriptions are.

How about open-access publishers?  The Public Library of Science (PLoS) is the largest (by number of articles published). They don’t charge subscriptions, not making any money there, so anyone can read the articles they publish. However they do charge most authors, or ultimately the authors’ funders, a fee for each manuscript. Although they don’t make profits because they’re a non-profit organization, we can look at their numbers and calculate something like an operating surplus margin.

PLoS last published detailed figures for 2013. They reported gross revenue and support of 46.87M (after waiving publication fees for authors who couldn’t pay). 2013 saw an increase in net assets of 9.87M, which is 21% of the gross revenue and support. The majority of their articles appear in PLoS ONE, which charges the authors $1350 per article. The surplus is eventually fed back into their operation, supporting further technological innovation in publishing, among other activities. I therefore think they shouldn’t be lumped together with for-profit publishers, but some people have asked me to compare them to the for-profits, so in the alternate version of the table below, I’ve included PLoS.


Hindawi charges only about $600 to authors to publish each article. Despite this relatively low cost, they  make an extraordinarily high profit of something like 52%. This may reflect some shortcuts and short-changing in their provision of services. As Jeffrey Beall has written, “Hindawi is not on my list of questionable publishers. I do receive complaints about Hindawi, however. They use spam a lot, most of their over 500 journals lack editors in chief, and it seems to be a publisher that focuses just on the authors’ needs and not so much the readers’.” Many open access publishers probably earn even higher profits, due to still-worse behavior. These can be classified as predators who scam unsuspecting authors.

While these profit figures show that the sciences clearly have enough money available to support publishing, the humanities are a different story. Many publishers in that area, such as university presses, are barely getting by, much like today’s newspaper publishers. So the large science, technology, and medicine publishers are outliers. Some, such as Elsevier, are still as fat as ever, suggesting that moves toward open access can go a lot further without endangering the provision of publishing services.

Disclosure: I am an editor for the open-access Registered Replication Reports, a type of article that appears in Perspectives on Psychological Science, a journal published by Sage for the Association for Psychological Science. Sage is a private company that is not required to report its financials. I wasn’t able to find profit figures for them.

Four Reasons to Oppose the Use of Elsevier’s Services for the Medical Journal of Australia

Elsevier has a history of unethical behaviour:
  1. Elsevier created fake medical journals to promote Merck products.
  2. Elsevier sponsored arms fairs for the international sale of weapons.
  3. Elsevier sponsored a bill that would have eliminated the NIH mandate that medical research be make freely available within 12 months of publication.
  4. Elsevier requires university and medical libraries to sign agreements that prevent them from reporting the exorbitant prices the libraries pay to subscribe to Elsevier’s journals.
Thanks to such practices, Elsevier makes an outrageous level of profit, 36% of revenue- higher than BMW and higher than the mining giant Rio TintoprofitChart. While researchers and research funders are attempting to transition medical and science publishing to an open access model, Elsevier seeks to hinder this transition. It is their corporate mandate to preserve the high level of profits they make by charging subscription fees for the articles that describe taxpayer-funded research.


Researchers ought to be using other providers, not channeling more money into Elsevier.


A “tell” for researcher innumeracy?

Evaluating scientists is hard work. Assessing quality requires digging deep into a researcher’s papers, scrutinising methodological details and the numbers behind the narrative. That’s why people look for shortcuts such as the number of papers a scientist has published or the impact factor of the journals published in.

When reading a job or grant application, I frequently wonder: Does this person really take their data seriously and listen to what it’s telling them, or are they just trying to churn out papers? It can be hard to tell. But I’ve noticed an unintentional tell in the use of numbers. Some people, when reporting numbers, habitually report far more decimal places than are warranted.

For example, Thomson/ISI reports its much-derided journal impact factors to three decimal places. This is unwarranted, an example of false precision, both because of the low counts of article numbers and citations typically involved, and because their variability year to year is high. One decimal place is plenty (and given how poor a metric impact factor is, I’d prefer that impact factor simply not be used).

When I see a CV with journal impact factor reported to three decimal places, I feel pushed toward the conclusion that the CV’s owner is not very numerate. So the reporting of impact factor is useful to me; not, however, in the way the researcher intended.

I don’t necessarily expect every researcher to fully understand the sizes, variability, and distribution of the numbers that go into impact factor, so I’m more concerned by how researchers report their own numbers. When to report all the decimal places calculated can be a subtle issue however, as full reporting of some numbers is important for reproducibility.

Bottom line, researchers should understand how summaries of data behave. Reporting numbers with faux precision is a bad sign.

For references on the issue of the third decimal place of impact factor:

UPDATE 8 May: Read this blog on the topic

Bar-Ilan, J. (2012). Journal report card. Scientometrics, 92, 249–260.

Mutz, R., & Daniel, H. D. (2012). The generalized propensity score methodology for estimating unbiased journal impact factors. Scientometrics, 92, 377–390.

old (from 2011) fast-track fee protest letter

There has been renewed interest in fast-track fees, after Nature Scientific Reports began piloting their use. Back in 2011, we wrote a protest letter to seven journals that were using fast-track fees at that time (some have since discontinued). The original website where we posted the letter is defunct, so I am re-posting here.

We write to ask that you discontinue the policy of fast-tracking submissions for a fee.

We have two objections to the policy. First is that we are against any form of preferential treatment for those who can pay. Fast-tracking for a fee creates a two-tier system, wherein the well-funded have an unfair advantage over the less well-to-do; in particular, it exacerbates the differences between developed and developing nations. The fast-track policy at the least allows faster publication by those with funds, improving the chance for the funded to win subsequent grants and to publish before other labs working on the same topic.

Our second objection to the policy stems from our concern that fast-tracked manuscripts will receive an advantage above and beyond just faster publication. Your policy requires that reviewers review more rapidly and editors make their decision in a shorter time than for non-fast-tracked manuscripts. There are three possible negative effects of this. First is that the reduced time for reviewers to spend on their work may lead to more superficial and less stringent reviews. Second is that the editor may sometimes have to complete their action letter on the basis of fewer reviews, when the reviewers do not finish by the deadline. The consequence is that at least some fast-tracked articles will receive less critical reviewing than those by author teams who do not pay for fast-track. The third possible negative effect reflects the linkage between fast-tracked articles and the journals finances. Your journal would receive more money if it evaluates fast-tracked articles less stringently, and even if it does not succumb to this incentive the readers may always have that perception.

Overall, the association of author fees with preferential treatment may eventually imperil sciences reputation among governments and the public. Science traditionally has been something of a refuge from the injustice of rich vs. poor, and previously in publishing there has always been the expectation that publication of an article is a mark of the quality of the work, not the depth of the pockets behind it.

Superficially, the policy of fees for fast-tracking seems similar to the Gold Open Access model, in which authors pay a fee to have their article published if it passes peer review. In most of those journals, however, the policy is set so that authors who pay are treated the same as those who dont. Most Gold OA journals offer a waiver for authors who cannot afford the usual fee, and reviewers and editors do not know whose fees are waived and whose are not. And in those unfortunate cases of journals that require a fee for all, at least there is no difference within the journal with some articles receiving preferential treatment.

We, the undersigned, will not submit work to a journal which offers competitive advantages at a financial premium; nor will we review for any such journal.

Alex O. Holcombe, PhD, Senior Lecturer, School of Psychology, University of Sydney (
Claudia Koltzenburg, Managing editor, Cellular Therapy and Transplantation (an open access journal in Western/Russian cooperation), University Medical Center Hamburg-Eppendorf, Germany (
Kaan Цztьrk, Dept. of Information Systems and Technologies, Yeditepe University, Istanbul, Turkey. (
Ayşe Karasu, METU, Dept. of Physics, Ankara, Turkey (
Arman Abrahamyan, PhD, Postdoctoral Research Fellow, School of Psychology, University of Sydney (
Bill Hooker, Portland, OR (
William Gunn, San Diego CA
Daniel Mietchen, PhD, Jena, Germany (
Daniel Linares, PhD, Generalitat de Catalunya, Spain (
Barton L. Anderson, School of Psychology, University of Sydney
Kiley Seymour, PhD, Alexander von Humboldt postdoctoral fellow, Berlin, Germany
Bjorn Brembs, PhD, Heisenberg Fellow, Freie Universitдt Berlin, Germany
M Fabiana Kubke, PhD, University of Auckland, New Zealand
Graham Steel, Glasgow, Scotland ( graham at )
Matthew Davidson, Psychology Dept, Columbia University (
Richard Badge, PhD, Lecturer, Department of Genetics, University of Leicester, UK (
Pedro Mendes, PhD, Professor, School of Computer Science, The University of Manchester, UK (
R. Steven Kurti, PhD, Director Biomaterials and Photonics Laboratory, Loma Linda University School of Dentistry, California (

The above are the original authors and signatories. The link [now dead] will reveal new (post 25 April 2011) signatories.

Reporting items from a stream, and mixture modeling to reveal buffering and a bottleneck

In our basic task, one or two streams of stimuli are rapidly presented. The target(s) to be reported are highlighted with cues that encircle them. On half of trials, participants are first queried about the left target, and in half they are first queried about the right target. This has no significant effect on the main result- a substantial disadvantage in reporting the right target, if the left target must also be reported.

In our basic task, one or two streams of stimuli are rapidly presented. The target(s) to be reported are highlighted with cues that encircle them. On half of trials, participants are first queried about the left target, and in half they are first queried about the right target. This has no significant effect on the main result- a substantial disadvantage in reporting the right target, if the left target must also be reported.

My collaborators and I have started using a new behavioural technique to better understand attentional selection from a rapid stream of stimuli. We have applied this to gain insights into the effect of naps on learning (Cellini et al., in press), the nature of the attentional blink (Goodbourn et al., in preparation), and function in parietal patients.

Here I explain the technique in the context of our study of a particular attentional phenomenon called pseudoextinction (Goodbourn & Holcombe, 2015).

The technique dissociates time of sampling visual information from the nature of subsequent processing. Stimuli are presented rapidly in series (a “stream”), shown here with one stream of letters on the left and a second stream on the right.

On an unpredictable frame in the sequence, the stimuli on that frame are cued by two circles, which enclose the stimuli. The participants’ task is to report the cued stimuli, letters in this case.

Accuracy is much poorer for the cued stimulus on the right than for the cued stimulus on the left. But if only one of the streams is cued, accuracy is equally high whether the cue is on the left or the right. This deficit specific to two-target conditions is pseudoextinction. The deficit is unaffected by which stream the participant is asked to report first. It likely reflects a severe capacity limit.

a. Each response of the participant corresponds to a particular item in the stream (because all items are presented on each trial). The distribution of the positions of these items is usually centred around the time of the cue, denoted as zero. b.  Mixture modelling fits the data with a combination of two distributions, the guessing distribution shown in light grey and a Gaussian, shown in dark grey. This fit yields the latency (mean) and temporal precision (standard deviation) of the Gaussian as well as the proportion of guessing trials.

a. Each response of the participant corresponds to a particular item in the stream (because all items are presented on each trial). The distribution of the positions of these items is usually centred around the time of the cue, denoted as zero. b. Mixture modelling fits the data with a combination of two distributions, the guessing distribution shown in light grey and a Gaussian, shown in dark grey. This fit yields the latency (mean) and temporal precision (standard deviation) of the Gaussian as well as the proportion of guessing trials.

Participants’ responses were coded in terms of the serial position of the corresponding item in the stream. For example, if a participant reports the letter ‘A’ for the left stream and it was presented not at the time of the cue but two frames later, that response is coded as +2. If their report corresponds to the item immediately preceding the cued stimulus, it is coded as -1, and a report of the cued item is coded as 0. Random guesses thus will contribute an approximately uniform distribution to the histogram of serial position errors . This is quantified by mixture modelling, which determines the relative proportion of guesses and cue-related reports that best fit the data. We model the cue-related responses as a Gaussian distribution. The mixture modeling procedure yields its latency (position of the peak of the distribution relative to the time of the cue) and precision (standard deviation). It also estimates the proportion of trials that participants guessed or misperceived the letter versus the complementary proportion, which we call efficacy, of trials that participants reported a letter from around the time of the cue.

In cuing experiments, researchers typically conceive of the appearance of the cue as triggering attention to begin sampling from the scene. However, we have consistently observed that the distribution is symmetric and centred near the time of the cue. This indicates that rather than the cue triggering the intake of information from the letter stream, the letters are taken into a buffer before the cue is even presented. If letters were not already in a buffer at the time of the cue, responses from the left (earlier) side of the distribution would be relatively uncommon, skewing the distribution towards later responses.

When two streams are presented, participants perform much better for the stream on the left (if the streams are in a horizontal configuration) or much better for the stream on the top (if the streams are vertically arrayed). If only one stream is presented, participants perform approximately equally in all four positions (data not shown).

When two streams are presented, participants perform much better for the stream on the left (if the streams are in a horizontal configuration) or much better for the stream on the top (if the streams are vertically arrayed). If only one stream is presented, participants perform approximately equally in all four positions (data not shown).

The pseudoextinction phenomenon, a right-side deficit when both streams are cued, manifests both in raw accuracy and also in the accuracy-related parameter of the mixture modelling. This is the efficacy parameter – the proportion of trials captured by the cue-related Gaussian distribution. Whereas efficacy when only one stream is presented or cued is similar on both the left and the right of fixation, and above and below fixation (not shown), when two streams are presented one stream suffers. The right stream suffers in a horizontal arrangement and in a vertical arrangement the inferior stream suffers, consistent with preferred reading order.

The decrease in efficacy for the extinguished stream is not accompanied by a change in latency or standard deviation of the Gaussian distribution of cue-related responses. Moreover, the correlogram of the serial position error for the two streams reveals that the two streams are sampled independently, indicating that the items are buffered independently, without regard to reading order or which hemisphere they are processed by. Together these results suggest that items are always sampled from the stream in the same way, but a subsequent processing limitation results in pseudoextinction if two targets must be processed.

Related patterns of performance have arisen in previous literature, and typically have been attributed to a difference between the left and right hemisphere (e.g. Scalf, Banich, Kramer, Narechania, & Simon, 2007). That however cannot explain the superior/inferior difference, so researchers sometimes then appeal to a difference in dorsal vs. ventral cerebral functioning. We suspect it instead reflects attentional prioritisation of the left item for serial high-level processing, for tokenisation or memory consolidation.


Cellini, N., Goodbourn, P.T., McDevitt, E.A., Martini, P., Holcombe, A.O., & Mednick, S.C. (in press). A daytime nap reduces the attentional blink. Attention, Perception, & Psychophysics.

Goodbourn, P.T. & Holcombe, A.O. (2015). ‘Pseudoextinction’: Asymmetries in simultaneous attentional selectionJournal of Experimental Psychology: Human Perception and Performance, 41(2), 364–84.

Martini, P. (2013) “Sources of bias and uncertainty in a visual temporal individuation task.” Attention, Perception, & Psychophysics 75: 168-181.

Scalf, P. E., Banich, M. T., Kramer, A. F., Narechania, K., & Simon, C. D. (2007). Double take: parallel processing by the cerebral hemispheres reduces attentional blink. Journal of Experimental Psychology. Human Perception and Performance, 33(2), 298–329. doi:10.1037/0096-1523.33.2.298

Nature Scientific Reports. Fast-tracking fees history and concerns.

Nature Scientific Reports has adopted is piloting fast-tracking for a fee.

Four years ago, I noticed that several journals had adopted such a policy. I raised a number of concerns, such as

  • What happens if the fast-tracking period elapses and a reviewer hasn’t gotten their review in yet? Will the decision about the manuscript be made without that review?
  • How is the additional money used? Does any go to reviewers?
  • Does the action editor know when a particular manuscript is being fast-tracked? Do the reviewers? To avoid monetary influence, both should be blind to this, but that seems impossible if these things are to be expedited.
  • Will articles which benefited from fast-tracking be indicated in a note associated with those articles? Without such a policy, all articles in the journal may be sullied, at least in the minds of cynics.
  • Are the fees worth risking the appearance of favoritism for money, the disadvantage in speed to scientists with fewer resources, and the possible loss of public trust in science?

We started a petition against the policy, and our complaints seem to have led to the demise of the policy at a few journals. For details, see my previous posts on the topic.

I suggest that the tag #fastTrackFee be used on social media to discuss this.