Bayesian jokes

It’s the end of the year, and I’m indulging myself by posting these Bayesian jokes. The first two were inspired by the #AcademicNoir hashtag.

And one outside the noir domain:

When retraction is not enough

A study suggesting that “Sadness impairs color perception” reporting two experiments was recently retracted from Psychological Science. But some colleagues and I don’t think the retraction goes far enough.

In the retraction notice, the authors suggested that after revising their second experiment to address the problems that they noted with it, they would seek to re-publish their original Experiment 1 with the revised Experiment 2.

But Experiment 1, and the basic methodology behind both experiments, is shoddy — there are more problems than just those mentioned in the retraction notice. Some of these problems are strange anomalies with the data, specific to Thorstenson et al’s experiments. Other problems, while still significant, are not uncommon in this research area.

When the now-retracted paper first appeared, twitter exploded with criticism, and many documented the study’s problems extensively on their blogs. Five of us got together over email to write a letter to Psychological Science calling for retraction. But before submitting the letter, we contacted the first author, Chris Thorstenson. He eventually told us that he and his colleagues would retract the paper.

When we saw the retraction notice, we noticed that only a few of the problems with the experiments were mentioned. I have been vexed by studies of this ilk for more than three years, and would like to see a general improvement in this research area. So we revised our letter to highlight the additional problems not mentioned by Thorste. We hope that our revised letter will help Thorstenson et al., plus other researchers in this area, to improve their methods.


What just happened with open access at the Journal of Vision?

Vision researchers recently received an email from ARVO, the publisher of Journal of Vision, that begins:

On January 1, 2016, Journal of Vision (JOV) will become open access.

But in the view of most, JoV has been open access since its inception! It’s always been an author-pays, free access journal: all articles are published on its website and can be downloaded by anyone.

But free-to-download is not enough for open access, not according to the definition of open access formulated in Budapest in 2001. Open access means (according to this definition) the right not only to download but also to

distribute, … pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers

But JoV, which has always held the copyright of the articles it publishes, says that “All companies, commercial and nonprofit, should contact ARVO directly for permission to reprint articles or parts thereof”.

Starting in 2016, such permission won’t be required.

However, paying your $1,850 for standard publication in JoV in 2016 will not get you everything. The updated Budapest declaration recommends that journals use the license CC-BY.  But JoV‘s publisher has instead chosen to use the license CC BY-NC-ND, , meaning that articles cannot be used commercially (“non-commercial”) and that you can’t distribute bits of the article (“no derivatives”). Yet increasingly today, parts of science involve mining and remixing previously-published data and content, which the ND clause of the license prohibits (unless you get special permission). Education and journalism requires re-use of bits too; think about how many textbooks and articles on the web show just one figure (a “derivative”) or illustration from a scientific paper.

And while the non-commercial, NC clause might sound rather harmless for spreading knowledge, it is sometimes unclear what non-commercial really covers. It may prevent a university, especially private universities, from distributing the article as part of course content that a student pays for (via their tuition).

For these reasons, CC BY is the way we should be going, which is why UK funders like the Wellcome Trust and RCUK require that researchers receiving grants from them publish their articles CC BY. To accommodate this, JoV as part of their new policy will license your article as CC BY, if you pay an additional fee of $500!

What ARVO has done here is only a small step forward for JoV, and unfortunately a rather confusing step. The bigger change has occurred with ARVO’s journal Investigative Ophthalmology and Vision Science (IOVS), which was only accessible via a subscription but starting in 2016, will be CC BY-NC-ND and CC BY.

As you can see, copyright is complicated. Researchers don’t have time to learn all this stuff. And that means recalcitrant publishers (not ARVO, I mean profiteers like Elsevier) can exploit this to obfuscate, complicate, and shift their policies to slow progress towards full open access.

Thanks to Tom Wallis and Irina Harris.

P.S. I think if ARVO had only been changing JoV‘s policies (rather than also the subscription journal IOVS) they wouldn’t have written “JoV will become open access” in that mass email. But because they did, it raised the issue of the full meaning of the term.

P.P.S. Partly because JoV is so expensive, at ECVP there’ll be a discussion of other avenues for open access publishing, such as PeerJ. Go! (I’ll be stuck in Sydney).

Yellow journalism and Manhattan murders

The headline screams “You’re 45% more likely to be murdered in de Blasio’s Manhattan”.

The evidence? Sixteen people have been killed so far this year in Manhattan, against only eleven over the same period last year.

Does this evidence indicate you are more likely to be murdered, as the headline says? To find out, I tested whether a constant murder rate could explain the results. The probability of getting murdered over the same period last year may be approximately 11/Manhattan’s population = 11/1,630,000 = 0.0000674 = .00674%.

Is it likely that with the same murder rate this period this year, one would get a number as high as 16 murders? Yes.

This can be seen by calculating the 95% confidence interval for 11/1,630,000, which according to 3 different statistical methods, spans 5 to 20. That is, even with a constant murder rate, due to statistical fluctuations, the murders over this period could easily have been as low as 5 or as high as 20.  Just like if one flips a coin 10 times, one may get 3 heads the first time and 6 the next, without the chance of a head changing.

Doing this more properly means comparing the two rates directly.  I did this using three different methods, all of which found no significant difference.

The article also reports that the number of shooting incidents is higher this year, 50 instead of 31. Using the three different statistical methods again, this was (barely) significantly different. So here the journalist has a point. But this should be taken with a big grain of salt. Journalists are always looking for “news”, and if they repeatedly look at how many people have been murdered/shot, eventually they are guaranteed to find an apparent difference, because all possible statistical fluctuations will happen eventually.

The statistics and the code are here.

I only did all this and wrote this post because Hal Pashler saw someone tweet the NYPost piece. Hal knew I had previously looked into the statistics of proportions and asked whether the headline was justified. I invite others to disagree with my calculations if they have a better way of doing it. I don’t think different methods will give a very different result, however.

Scholarly publisher profit update.

I made the below slide for a talk in 2012 to show that the biggest corporate scientific publishers are outrageously profitable.  But that was 3 years ago. How do they look now?

Screen Shot 2013-01-09 at 12.35.26 PM

Outdated figures, created in 2012

The 40% figure for Wiley in my original slide at left may have been “overinflated”, as a helpful someone explained on twitter. The issue is that the figure (which I got from Heather Morrison’s thesis) did not have costs subtracted from it (as Heather explained to me later), unlike had been done for the other companies to yield operating profit. Subtracting the costs yields a much lower profit of about 20%. This revised figure is however an underestimate in that it doesn’t include the profits sent to societies that many of the journals were published in partnership with. But because Wiley has not released detailed numbers since 2012, I have dropped them from this update.

In my updated table (below), Rio Tinto’s operating profit margin is at 23%, based on 11.3 billion in operating profit and 47.7 billion in consolidated sales revenue (p. 27 of their media release) in 2014.publisherProfits2015edition

The 10% operating profit figure for BMW reflects their 2013 revenue of 76B and operating income of 7.85B. I didn’t find more recent figures, but indications are their profit hasn’t changed much.

Google’s profit was 25% for 2014. Income from operations was 16.5B, with revenue of 66B, according to their 10-K.

Apple at 29% for the year ending 27 Sep 2014 reflects operating income of 52.5B on net sales of 182.2B from their 10-K.

Springer seems to have not released detailed numbers since 2012, but in 2012 they wrote that sales were 981.1m euros, with EBITDA of 342.8m for margin of 35%. EBITDA and operating profit differs in the inclusion of non-operating income, but I suspect that non-operating income was fairly negligible.

Elsevier’s journal publishing business is in its Scientific, Technical, & Medical division. That division reported, for 2014, an adjusted operating profit of 762£m on 2,048 £m in revenue, or 37%. Their 2013 margin was also 37%.

Because it’s difficult to know and allocate costs, these numbers should only be considered as ballpark estimates, and I’m unlikely to update the tables each time someone arrives at a different figure.

These Elsevier and Springer operations do more than publish journals. They also make money from a few other sources, such as databases of medical and legal information. Rather than their overall profit, we’d like to know how much they make from journal articles specifically. There’s no way to know this, because the companies don’t release this information. Elsevier even requires its universities to sign confidentiality agreements so that no one can know how much the price of the subscriptions are.

How about open-access publishers?  The Public Library of Science (PLoS) is the largest (by number of articles published). They don’t charge subscriptions, not making any money there, so anyone can read the articles they publish. However they do charge most authors, or ultimately the authors’ funders, a fee for each manuscript. Although they don’t make profits because they’re a non-profit organization, we can look at their numbers and calculate something like an operating surplus margin.

PLoS last published detailed figures for 2013. They reported gross revenue and support of 46.87M (after waiving publication fees for authors who couldn’t pay). 2013 saw an increase in net assets of 9.87M, which is 21% of the gross revenue and support. The majority of their articles appear in PLoS ONE, which charges the authors $1350 per article. The surplus is eventually fed back into their operation, supporting further technological innovation in publishing, among other activities. I therefore think they shouldn’t be lumped together with for-profit publishers, but some people have asked me to compare them to the for-profits, so in the alternate version of the table below, I’ve included PLoS.


Hindawi charges only about $600 to authors to publish each article. Despite this relatively low cost, they  make an extraordinarily high profit of something like 52%. This may reflect some shortcuts and short-changing in their provision of services. As Jeffrey Beall has written, “Hindawi is not on my list of questionable publishers. I do receive complaints about Hindawi, however. They use spam a lot, most of their over 500 journals lack editors in chief, and it seems to be a publisher that focuses just on the authors’ needs and not so much the readers’.” Many open access publishers probably earn even higher profits, due to still-worse behavior. These can be classified as predators who scam unsuspecting authors.

While these profit figures show that the sciences clearly have enough money available to support publishing, the humanities are a different story. Many publishers in that area, such as university presses, are barely getting by, much like today’s newspaper publishers. So the large science, technology, and medicine publishers are outliers. Some, such as Elsevier, are still as fat as ever, suggesting that moves toward open access can go a lot further without endangering the provision of publishing services.

Disclosure: I am an editor for the open-access Registered Replication Reports, a type of article that appears in Perspectives on Psychological Science, a journal published by Sage for the Association for Psychological Science. Sage is a private company that is not required to report its financials. I wasn’t able to find profit figures for them.

Four Reasons to Oppose the Use of Elsevier’s Services for the Medical Journal of Australia

Elsevier has a history of unethical behaviour:
  1. Elsevier created fake medical journals to promote Merck products.
  2. Elsevier sponsored arms fairs for the international sale of weapons.
  3. Elsevier sponsored a bill that would have eliminated the NIH mandate that medical research be make freely available within 12 months of publication.
  4. Elsevier requires university and medical libraries to sign agreements that prevent them from reporting the exorbitant prices the libraries pay to subscribe to Elsevier’s journals.
Thanks to such practices, Elsevier makes an outrageous level of profit, 36% of revenue- higher than BMW and higher than the mining giant Rio TintoprofitChart. While researchers and research funders are attempting to transition medical and science publishing to an open access model, Elsevier seeks to hinder this transition. It is their corporate mandate to preserve the high level of profits they make by charging subscription fees for the articles that describe taxpayer-funded research.


Researchers ought to be using other providers, not channeling more money into Elsevier.


A “tell” for researcher innumeracy?

Evaluating scientists is hard work. Assessing quality requires digging deep into a researcher’s papers, scrutinising methodological details and the numbers behind the narrative. That’s why people look for shortcuts such as the number of papers a scientist has published or the impact factor of the journals published in.

When reading a job or grant application, I frequently wonder: Does this person really take their data seriously and listen to what it’s telling them, or are they just trying to churn out papers? It can be hard to tell. But I’ve noticed an unintentional tell in the use of numbers. Some people, when reporting numbers, habitually report far more decimal places than are warranted.

For example, Thomson/ISI reports its much-derided journal impact factors to three decimal places. This is unwarranted, an example of false precision, both because of the low counts of article numbers and citations typically involved, and because their variability year to year is high. One decimal place is plenty (and given how poor a metric impact factor is, I’d prefer that impact factor simply not be used).

When I see a CV with journal impact factor reported to three decimal places, I feel pushed toward the conclusion that the CV’s owner is not very numerate. So the reporting of impact factor is useful to me; not, however, in the way the researcher intended.

I don’t necessarily expect every researcher to fully understand the sizes, variability, and distribution of the numbers that go into impact factor, so I’m more concerned by how researchers report their own numbers. When to report all the decimal places calculated can be a subtle issue however, as full reporting of some numbers is important for reproducibility.

Bottom line, researchers should understand how summaries of data behave. Reporting numbers with faux precision is a bad sign.

For references on the issue of the third decimal place of impact factor:

UPDATE 8 May: Read this blog on the topic

Bar-Ilan, J. (2012). Journal report card. Scientometrics, 92, 249–260.

Mutz, R., & Daniel, H. D. (2012). The generalized propensity score methodology for estimating unbiased journal impact factors. Scientometrics, 92, 377–390.