problems with “controlling for” variables: quick notes

In science, we frequently see a comparison between two groups of people that differ on multiple demographic variables, say age, IQ, and income, investigating some dependent measures, say body mass index (BMI). Results are frequently reported as “the groups had substantially different BMIs, after controlling for X and Y” using ANCOVA or multiple regression. We are given the impression that this analysis shows that the groups would have different BMIs even if they had the same levels of X and Y.

Is this inference valid? Maybe. Extensive thought and scrutiny of the data would be required to determine whether this is a reasonable inference. I was thinking of discussing this a bit in my undergraduate teaching, so I asked about it on twitter. A bunch of people both provided helpful responses and asked me to report back.

An obvious problem is that the two groups may differ on other things besides X and Y, many of which you may not have even been measured. So the difference between the groups may be entirely attributable to those confounds. This post is about some less obvious problems. Here are some quick snippets from what people pointed me to.

First, from Miller & Chapman (2001), below. Thanks to @BrandesJanina for pointing me to this paper.

consider a data set in which two groups are older men and younger women, and gender is of interest as an independent variable, Grp. Using age as a covariate does indeed remove age variance. The problem is that, because age and gender are correlated in this data set, removing variance associated with Cov will also remove some (shared) variance due to Grp. Within this data set, there is no way to determine what values of DV men younger than those tested or women older than those tested would have provided. Far from “controlling for” age, the ANCOV A will systematically distort the gender variable. As in our presentation of Lord’s Paradox above, GrPres will not be a valid measure of the construct of gender….

Consider a data set consisting of childrens’ age, height, and weight. If we conduct an ANCOVA in which height is the covariate, age is the grouping variable, and weight is the dependent variable, we are attempting to ask whether younger and older children would differ in weight if they did not happen to differ in height. If the groups indeed do not differ on the covariate, this question can be asked. But if there is something about the construct of age in childhood that inherently involves differences in height, the question makes no sense, because then age with height partialed out would no longer be age. There is no way to “equate” older and younger children on height, because growth is an inherent (not chance or noise) differentiation of the two groups….

Cohen and Cohen (1983) provided the following extreme example: “Consider the fact that the difference in mean height between the mountains of the Himalayan and Catskill ranges, adjusting for differences in atmospheric pressure, is zero!” (p. 425), the point being that one has not in any sense “equated” the two mountain ranges by using atmospheric pressure as a covariate.

Screen Shot 2016-02-03 at 06.56.25.png

-Miller & Chapman (2001)

Let’s go back to my opening example of a BMI difference between two groups, after “controlling for” variables statistically. What if one of those variables controlled for was age? Well, if the two groups were people who exercise and people who don’t, there is very likely variance shared by age and level of exercise, and age likely has a causal influence on exercise (by various routes), so the meaning of the exercise factor is unclear after age has been removed.

The problem of measurement (un)reliability

From Westfall & Yarkoni (submitted):

Suppose we are given city statistics covering a four-month summer period, and observe that swimming pool deaths tend to increase on days when more ice cream is sold. As astute analysts, we immediately identify average daily temperature as a confound: on hotter days, people are more likely to both buy ice cream and visit swimming pools. Using multiple regression, we can statistically control for this confound, thereby eliminating the direct relationship between ice cream sales and swimming pool deaths.

Now consider the following twist. Rather than directly observing recorded daily temperatures, suppose we obtain self-reported Likert ratings of subjectively perceived heat levels. A simulated batch of 120 such observations is illustrated in Figure 1, with the reliability of the subjective heat ratings set to 0.40—a fairly typical level of reliability for a single item in psychology1. Figure 2 illustrates what happens when the error-laden subjective heat ratings are used in place of the more precisely recorded daily temperatures. The simple relationship between ice cream sales and swimming pool deaths (Fig. 2A) is positive and substantial, r(118) = .49, p < .001. When controlling for the subjective heat ratings (Fig. 2B), the partial correlation between ice cream sales and swimming pool deaths is smaller, but remains positive and statistically significant, r(118) = .33, p < .001. Is the conclusion warranted that ice cream sales are a useful predictor of swimming pool deaths, over and above daily temperature? Obviously not. The problem is that subjective heat ratings are a noisy proxy for physical temperature, so controlling for the former does not equate observations on the latter. If we explicitly control for recorded daily temperatures (Fig. 2C), the spurious relationship is eliminated, as we would intuitively expect, r(118) = -.02, p = .81.

Given that most psychological measurements have considerable unreliability (lack of perfect correlation with the construct they are trying to get at), the problem is very general. And it can lead both to spurious conclusions of a relationship as well as spurious conclusions of a non-relationship.

I do not use ANCOVA or GLMs in this way so I may have given a misleading impression with some of what I have written or quoted above. If so, I would love to be corrected.

Bayesian jokes

It’s the end of the year, and I’m indulging myself by posting these Bayesian jokes. The first two were inspired by the #AcademicNoir hashtag.

 

 

 

And one outside the noir domain:

 

 

When retraction is not enough

A study suggesting that “Sadness impairs color perception” reporting two experiments was recently retracted from Psychological Science. But some colleagues and I don’t think the retraction goes far enough.

In the retraction notice, the authors suggested that after revising their second experiment to address the problems that they noted with it, they would seek to re-publish their original Experiment 1 with the revised Experiment 2.

But Experiment 1, and the basic methodology behind both experiments, is shoddy — there are more problems than just those mentioned in the retraction notice. Some of these problems are strange anomalies with the data, specific to Thorstenson et al’s experiments. Other problems, while still significant, are not uncommon in this research area.

When the now-retracted paper first appeared, twitter exploded with criticism, and many documented the study’s problems extensively on their blogs. Five of us got together over email to write a letter to Psychological Science calling for retraction. But before submitting the letter, we contacted the first author, Chris Thorstenson. He eventually told us that he and his colleagues would retract the paper.

When we saw the retraction notice, we noticed that only a few of the problems with the experiments were mentioned. I have been vexed by studies of this ilk for more than three years, and would like to see a general improvement in this research area. So we revised our letter to highlight the additional problems not mentioned by Thorste. We hope that our revised letter will help Thorstenson et al., plus other researchers in this area, to improve their methods.

 

What just happened with open access at the Journal of Vision?

Vision researchers recently received an email from ARVO, the publisher of Journal of Vision, that begins:

On January 1, 2016, Journal of Vision (JOV) will become open access.

But in the view of most, JoV has been open access since its inception! It’s always been an author-pays, free access journal: all articles are published on its website and can be downloaded by anyone.

But free-to-download is not enough for open access, not according to the definition of open access formulated in Budapest in 2001. Open access means (according to this definition) the right not only to download but also to

distribute, … pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers

But JoV, which has always held the copyright of the articles it publishes, says that “All companies, commercial and nonprofit, should contact ARVO directly for permission to reprint articles or parts thereof”.

Starting in 2016, such permission won’t be required.

However, paying your $1,850 for standard publication in JoV in 2016 will not get you everything. The updated Budapest declaration recommends that journals use the license CC-BY.  But JoV‘s publisher has instead chosen to use the license CC BY-NC-ND, , meaning that articles cannot be used commercially (“non-commercial”) and that you can’t distribute bits of the article (“no derivatives”). Yet increasingly today, parts of science involve mining and remixing previously-published data and content, which the ND clause of the license prohibits (unless you get special permission). Education and journalism requires re-use of bits too; think about how many textbooks and articles on the web show just one figure (a “derivative”) or illustration from a scientific paper.

And while the non-commercial, NC clause might sound rather harmless for spreading knowledge, it is sometimes unclear what non-commercial really covers. It may prevent a university, especially private universities, from distributing the article as part of course content that a student pays for (via their tuition).

For these reasons, CC BY is the way we should be going, which is why UK funders like the Wellcome Trust and RCUK require that researchers receiving grants from them publish their articles CC BY. To accommodate this, JoV as part of their new policy will license your article as CC BY, if you pay an additional fee of $500!

What ARVO has done here is only a small step forward for JoV, and unfortunately a rather confusing step. The bigger change has occurred with ARVO’s journal Investigative Ophthalmology and Vision Science (IOVS), which was only accessible via a subscription but starting in 2016, will be CC BY-NC-ND and CC BY.

As you can see, copyright is complicated. Researchers don’t have time to learn all this stuff. And that means recalcitrant publishers (not ARVO, I mean profiteers like Elsevier) can exploit this to obfuscate, complicate, and shift their policies to slow progress towards full open access.

Thanks to Tom Wallis and Irina Harris.

P.S. I think if ARVO had only been changing JoV‘s policies (rather than also the subscription journal IOVS) they wouldn’t have written “JoV will become open access” in that mass email. But because they did, it raised the issue of the full meaning of the term.

P.P.S. Partly because JoV is so expensive, at ECVP there’ll be a discussion of other avenues for open access publishing, such as PeerJ. Go! (I’ll be stuck in Sydney).

Yellow journalism and Manhattan murders

The headline screams “You’re 45% more likely to be murdered in de Blasio’s Manhattan”.

The evidence? Sixteen people have been killed so far this year in Manhattan, against only eleven over the same period last year.

Does this evidence indicate you are more likely to be murdered, as the headline says? To find out, I tested whether a constant murder rate could explain the results. The probability of getting murdered over the same period last year may be approximately 11/Manhattan’s population = 11/1,630,000 = 0.0000674 = .00674%.

Is it likely that with the same murder rate this period this year, one would get a number as high as 16 murders? Yes.

This can be seen by calculating the 95% confidence interval for 11/1,630,000, which according to 3 different statistical methods, spans 5 to 20. That is, even with a constant murder rate, due to statistical fluctuations, the murders over this period could easily have been as low as 5 or as high as 20.  Just like if one flips a coin 10 times, one may get 3 heads the first time and 6 the next, without the chance of a head changing.

Doing this more properly means comparing the two rates directly.  I did this using three different methods, all of which found no significant difference.

The article also reports that the number of shooting incidents is higher this year, 50 instead of 31. Using the three different statistical methods again, this was (barely) significantly different. So here the journalist has a point. But this should be taken with a big grain of salt. Journalists are always looking for “news”, and if they repeatedly look at how many people have been murdered/shot, eventually they are guaranteed to find an apparent difference, because all possible statistical fluctuations will happen eventually.

The statistics and the code are here.

I only did all this and wrote this post because Hal Pashler saw someone tweet the NYPost piece. Hal knew I had previously looked into the statistics of proportions and asked whether the headline was justified. I invite others to disagree with my calculations if they have a better way of doing it. I don’t think different methods will give a very different result, however.

Scholarly publisher profit update.

I made the below slide for a talk in 2012 to show that the biggest corporate scientific publishers are outrageously profitable.  But that was 3 years ago. How do they look now?

Screen Shot 2013-01-09 at 12.35.26 PM

Outdated figures, created in 2012

The 40% figure for Wiley in my original slide at left may have been “overinflated”, as a helpful someone explained on twitter. The issue is that the figure (which I got from Heather Morrison’s thesis) did not have costs subtracted from it (as Heather explained to me later), unlike had been done for the other companies to yield operating profit. Subtracting the costs yields a much lower profit of about 20%. This revised figure is however an underestimate in that it doesn’t include the profits sent to societies that many of the journals were published in partnership with. But because Wiley has not released detailed numbers since 2012, I have dropped them from this update.

In my updated table (below), Rio Tinto’s operating profit margin is at 23%, based on 11.3 billion in operating profit and 47.7 billion in consolidated sales revenue (p. 27 of their media release) in 2014.publisherProfits2015edition

The 10% operating profit figure for BMW reflects their 2013 revenue of 76B and operating income of 7.85B. I didn’t find more recent figures, but indications are their profit hasn’t changed much.

Google’s profit was 25% for 2014. Income from operations was 16.5B, with revenue of 66B, according to their 10-K.

Apple at 29% for the year ending 27 Sep 2014 reflects operating income of 52.5B on net sales of 182.2B from their 10-K.

Springer seems to have not released detailed numbers since 2012, but in 2012 they wrote that sales were 981.1m euros, with EBITDA of 342.8m for margin of 35%. EBITDA and operating profit differs in the inclusion of non-operating income, but I suspect that non-operating income was fairly negligible.

Elsevier’s journal publishing business is in its Scientific, Technical, & Medical division. That division reported, for 2014, an adjusted operating profit of 762£m on 2,048 £m in revenue, or 37%. Their 2013 margin was also 37%.

Because it’s difficult to know and allocate costs, these numbers should only be considered as ballpark estimates, and I’m unlikely to update the tables each time someone arrives at a different figure.

These Elsevier and Springer operations do more than publish journals. They also make money from a few other sources, such as databases of medical and legal information. Rather than their overall profit, we’d like to know how much they make from journal articles specifically. There’s no way to know this, because the companies don’t release this information. Elsevier even requires its universities to sign confidentiality agreements so that no one can know how much the price of the subscriptions are.

How about open-access publishers?  The Public Library of Science (PLoS) is the largest (by number of articles published). They don’t charge subscriptions, not making any money there, so anyone can read the articles they publish. However they do charge most authors, or ultimately the authors’ funders, a fee for each manuscript. Although they don’t make profits because they’re a non-profit organization, we can look at their numbers and calculate something like an operating surplus margin.

PLoS last published detailed figures for 2013. They reported gross revenue and support of 46.87M (after waiving publication fees for authors who couldn’t pay). 2013 saw an increase in net assets of 9.87M, which is 21% of the gross revenue and support. The majority of their articles appear in PLoS ONE, which charges the authors $1350 per article. The surplus is eventually fed back into their operation, supporting further technological innovation in publishing, among other activities. I therefore think they shouldn’t be lumped together with for-profit publishers, but some people have asked me to compare them to the for-profits, so in the alternate version of the table below, I’ve included PLoS.

publisherProfitsIncludingPLoS2015edition

Hindawi charges only about $600 to authors to publish each article. Despite this relatively low cost, they  make an extraordinarily high profit of something like 52%. This may reflect some shortcuts and short-changing in their provision of services. As Jeffrey Beall has written, “Hindawi is not on my list of questionable publishers. I do receive complaints about Hindawi, however. They use spam a lot, most of their over 500 journals lack editors in chief, and it seems to be a publisher that focuses just on the authors’ needs and not so much the readers’.” Many open access publishers probably earn even higher profits, due to still-worse behavior. These can be classified as predators who scam unsuspecting authors.

While these profit figures show that the sciences clearly have enough money available to support publishing, the humanities are a different story. Many publishers in that area, such as university presses, are barely getting by, much like today’s newspaper publishers. So the large science, technology, and medicine publishers are outliers. Some, such as Elsevier, are still as fat as ever, suggesting that moves toward open access can go a lot further without endangering the provision of publishing services.

Disclosure: I am an editor for the open-access Registered Replication Reports, a type of article that appears in Perspectives on Psychological Science, a journal published by Sage for the Association for Psychological Science. Sage is a private company that is not required to report its financials. I wasn’t able to find profit figures for them.

Four Reasons to Oppose the Use of Elsevier’s Services for the Medical Journal of Australia

Elsevier has a history of unethical behaviour:
  1. Elsevier created fake medical journals to promote Merck products.
  2. Elsevier sponsored arms fairs for the international sale of weapons.
  3. Elsevier sponsored a bill that would have eliminated the NIH mandate that medical research be make freely available within 12 months of publication.
  4. Elsevier requires university and medical libraries to sign agreements that prevent them from reporting the exorbitant prices the libraries pay to subscribe to Elsevier’s journals.
Thanks to such practices, Elsevier makes an outrageous level of profit, 36% of revenue- higher than BMW and higher than the mining giant Rio TintoprofitChart. While researchers and research funders are attempting to transition medical and science publishing to an open access model, Elsevier seeks to hinder this transition. It is their corporate mandate to preserve the high level of profits they make by charging subscription fees for the articles that describe taxpayer-funded research.

 


Researchers ought to be using other providers, not channeling more money into Elsevier.