scientific journal articles: Too much is left out June 1, 2009
Posted by alexholcombe in open access, science, science 2.0.add a comment
The current system of publishing scientific papers hampers scientific advancement in particular ways. I’d like to diagnose the problems so that new publishing efforts (like PLoS ONE) can be shaped to remedy them.
I believe that a major flaw in the modern journal system is that it discourages complete treatments of the literature. A few factors work together to cause this:
- Most prestigious papers nowadays are very short- e.g. high impact, multidisciplinary journals mostly contain very short papers.
- To get their paper in, authors often are better off not mentioning problematic aspects of one’s paper or its relationship to the literature. After all they can always work on them if the reviewers force them to, but why undermine one’s own argument and raise red flags for the reviewers and reader if they don’t have to?
- Even in long papers, most authors avoid direct criticism of papers if the issue is not critical to their thesis.
These problems combine to create a large “back story” of information to every paper to which most readers will be forever oblivious. And there is even more that I won’t discuss now- we all know papers are largely a fairy tale, one more digestible than would be an accurate record of the twists and turns that really happened in the lab.
To factor 2, one might respond that in practically all writing of any kind, the authors have an incentive to sweep their problems under the rug. Perhaps there is little hope of remedying the problem. But the problem is worse than it need be. For one thing, knowledge of the problems with an article is available, but suppressed. During the review process, journals ask experts in the area to evaluate the science of a manuscript in depth. Reviewers make very extensive comments including links to other literature and concerns about the methods. Unfortunately, very few journals actually publish these comments (there are reasons for this, but they could be published more often). The authors incorporate many of the comments into their manuscript revisions but many are left out. Even those concerns that result in the manuscript being modified are usually hard to detect by all but the most expert of readers of the final manuscript.
Besides the reviewers, many readers immediately recognize problems with a new article and unmentioned links to other scientific results. Traditionally publishing these concerns is pretty difficult. Now that some journals allow anyone to post comments on an article, there is no real obstacle. Nevertheless the average scientist does not post comments, probably because he believes he is more likely to be punished for doing so (by the authors responding in kind with criticism) than rewarded (he cannot put the comments on his all-important academic CV). Many people (e.g. Cameron) have thought about how to reform the system to reward these comments, and I’ll set that aside for now.
Regarding point 3 above, in some instances of “selective citation”, the authors have omitted citing a paper because they have a reason not to believe its conclusion. They may have spotted a logical or technical methodological problem, or made an unreported attempt at replication that failed. I have done this many times myself.
Do you agree that these are major problems? How can we address them?
I’ll describe some ideas for partial remedies in subsequent posts.
Isfahan, Iran May 28, 2009
Posted by alexholcombe in history.add a comment
Isfahan, Iran has one of the most beautiful plazas in the world. A gorgeous book about that city has been redone and includes a full-page version of this photograph I took of its Friday Mosque. I hope the book, Isfahan: Pearl of Persia gets the readers it deserves. I also highly recommend visiting Iran yourself.

More photos
scientific computing with Python webinar May 22, 2009
Posted by alexholcombe in Python, programming, science.add a comment
https://www1.gotomeeting.com/register/422340144
Fri, May 22, 2009 3:00 PM – 3:30 PM CDT
description: http://blog.enthought.com/?p=116
They will try to record it for later playback.
UPDATE: The recording and announcement of future broadcasts is at the Enthought site
ggplot2 quickly makes beautiful plots in R May 18, 2009
Posted by alexholcombe in programming, science.1 comment so far
Earlier I wrote about the open-source free tools I use to plot and analyze my data—Python and R. One of the most time-consuming and fiddly parts of making graphs for our papers is the need to:
- plot multiple subsets of the data (different experimental conditions), sometimes with double axes
- make a whole array of plots, one for each of the experimental participants’ data
I’ve been doing this in python with scipy by coding an outer loop iterating over the different participants’ data and the inner loop iterating through the experimental conditions. I also write code to label all the conditions and participants, put the horizontal axes only on the bottom-most plots, the vertical axis only on the left-most plots, and sometimes code to offset the different conditions’ data points slightly so they don’t completely occlude each other. This can be a huge pain.
Recently Dani has discovered a much better way. There is a library of R code called ggplot2 that does all these things for you and more to yield really beautiful and clean arrays of plots. Dani has posted an example of the few lines of code needed and the resulting plot. Ideally I would like to do all this in Python without having to use R but the ggplot2 R library is wonderful and I haven’t seen anything remotely like it (yet) for Python.
Wall of Shame & Schadenfreude April 19, 2009
Posted by alexholcombe in open access, science.add a comment
In my graduate school days, we had something called ‘The Scathing Wall’, a small bulletin board which provided some outlet for one of our greatest frustrations—dealing with the reviews that came back from a journal after we had submitted a manuscript. On the margins of a few particularly unpleasant reviews, some in the lab had scribbled their complaints about the unintelligible things the (anonymous) reviewers had written, the unmitigated idiocy of the reviewers, or the intransigence of the action editor.
This humble Wall served a valuable educational function, giving insight into the journal publication process that was otherwise very opaque to a beginning researcher like myself. Seeing the failures of others in my lab could help save me from making the same mistakes, as well as make me less discouraged by the rejection of my own work by journals—on the Wall I could see that it happened even to my scientific idols.
The example of that little bulletin board has inspired me to create larger installations, for the entire departments at which I’ve taught, that I call ‘Walls of Shame & Schadenfreude’. I think they help plug an important gap in the average PhD program’s curriculum—what to expect when submitting to journals and how to deal with reviewers and editors. The best pieces on the Walls are probably those that show an extended back-and-forth between journal and author of manuscript submissions and re-submissions, rejection letters, responses to reviewers, appeals of the editor’s decision, and eventual publication. Students can see the even professors sometimes have to go through hell to get their work published. To gather material for the Wall, I sent the email pasted below to everyone in my department. I hope this post will inspire others to do something similar at other universities.
Dear fellow academics,
Please submit the most hurtful reviews of your manuscripts and grants to the School’s unofficial and in-development Wall of Shame & Schadenfreude.
The idea is a bulletin board, to be located in the postgraduate area, to educate postgraduates about the publication process. Various members of the school, from lowly lecturers to lofty professors, would display examples from journals of action letters and reviews which reject their manuscripts or grant applications. Only particularly vicious, demeaning, or dismissive reviews will be accepted. This Wall of Shame & Schadenfreude would show that even the most respected researchers among us can and do fail. Members of the school, especially lecturers and postgrads, might visit the Wall each time another of their papers were rejected, for inspiration during the revision and resubmission process. Also the WoSS would be a continuing source of amusement and, of course, the fine pleasures of Schadenfreude shoud not be underestimated. Especially educational would be if a negative review could be annotated with marginalia by the submitter, to narrate the history of the manuscript, for example explaining how many times and to how many journals the manuscript was submitted, or pointing out the inanities of the review, or highlighting the particularly risible parts.
There should be some redaction of reviewer identities in the reviews in order to protect the confidentiality of the review process, but the identity of the person rejected should not be redacted, as exposing this—that even the mighty among us receive harsh rejections—is part of the point. As to the location of the wall’s erection, it should be a private and humble, albeit soon to be hallowed, place. We would not want a chancellor, APAG committee, or other humorless entity to stumble upon this shrine. A suitable location may be a portion of the postgraduate ‘fishbowl’ glass walls that enclose the kitchen area.
Please send me an email if you have any demeaning reviews that you’d consider contributing. It seems unlikely that I will be overwhelmed with submissions, but I’m hoping that a brave few are self-deprecating enough to get us started.
open research through automated lab archiving April 6, 2009
Posted by alexholcombe in open access, programming, science.Tags: open research
1 comment so far
My scientific workflow includes email between myself and lab members and collaborators, annotations on previously published papers, adding information and ideas to the lab wiki, Python programs to create visual displays and run experiments with them, and Python and R code to plot the results and do the statistical analysis.
A long-term goal is to link these things together so that each project in the lab has an electronic paper trail of different python programs that were written, emails that were exchanged, experiment variants that were tried, data that was collected, statistical analyses conducted, and manuscripts that were written.
I want all this for two reasons.
- Most importantly, to compensate for my failing ability to keep abreast of all the lab projects and remember everything we’ve done for each. I should be able to type ‘Frohlich’ into a searchbox somewhere and see all the materials related to our experiments on the Frohlich effect.
- Second, to move closer to open research, where others can see what we’re doing, get in touch if they’re interested in collaborating or know something relevant, and see our unpublished negative results so that they needn’t repeat all the associated work.
There’s a lot of discussion recently about how to move towards open research and open notebook science (e.g. Cameron Neylon and Ian Davis). I don’t know how much others are already saying this, but to me the way forward is to create a system that would appeal to nearly every scientist’s desire for #1 above regardless of whether they support open research: a system to archive what the lab does in an organized fashion so that each lab member can access it. I find this easiest to do by using web archiving systems like Google Groups for lab email and files. I don’t have most steps working in easy and automated fashion and I don’t have the programming skills to get them working with necessary ingredients like version history, linking code and output files, etc. However once someone does, bona fide open research would be just a matter of changing the permissions so that anyone in the world can access it.
Data analysis with Python, SciPy and R April 5, 2009
Posted by alexholcombe in probability and statistics, programming, psychology, science.Tags: open source
1 comment so far
I’ve transitioned to all open-source software for my science. The Python language and its libraries VisionEgg and Psychopy are more than sufficient to code my perception experiments. For data analysis, I’ve gotten pretty far with the SciPy library for Python, which has probability distributions, function minimization, Fourier transforms, etc. The Matplotlib library makes it easy to make plots in a way familiar for old MATLAB users like me. Unfortunately however, it appears that nothing’s available for taking a load of data, data that’s formatted with many entries (e.g. rows) each of which has several values associated with it (one for each independent and dependent variable of the experiment), and
- summarizing (calculating mean etc.) of the dependent variable contingent on various independent variables (like an Excel pivot-table)
- performing the all-important (in experimental psychology and neuroscience) multiple linear regression and ANOVAs.
I wrote something for #1, but #2 is too much for me. I have had to start using R.
R appears to be the best open-source data analysis and statistics program, and has an incredible variety of packages for all sorts of analyses, often programmed as soon as a statistics professor dreams it up. For example, there is a package for the directional statistics I need, which I don’t think you can find in SPSS or SAS. The R syntax is really clunky, as opposed to the beauty that is Python, which is irritating but doesn’t actually slow one down much.
Fortunately RPy2 allows one to call R functions from Python. It’s a fairly basic interface and took me awhile to understand how to pass data between Python and R, but it works well. I’m very grateful to the developers, who deserve more help.
The documentation of all these Python libraries leaves a lot to be desired. The example code snippets for SciPy are still too sparse, and more are sorely needed to help users quickly do specific things without having to spend an hour figuring out exactly what some poorly-documented function’s parameters do. The same goes for RPy. I hope to help out when I have time.
summarizing data by combinations of variables with python January 26, 2009
Posted by alexholcombe in science.Tags: code, programming, Python, SciPy
2 comments
For data analysis, I switched from using MATLAB, partially motivated by a desire to support open source, to using R. But my experiments nowadays are written in Python, so I decided to try analyzing the data with Python as well.
SciPy is an open-source library that helps with this, and duplicates a lot of MATLAB functionality to make it easier to switch from MATLAB. IPython provides an interactive command line with tab-completion, history, and some of the other conveniences that come with MATLAB. It’s been working well for my data plotting, except my code was becoming cumbersome when it came to extracting the data I wanted to plot. The loadtxt function easily imports my data files in a structure called a recarray, similar to a data.frame in R, a lot like a flat spreadsheet with a name for each column. Then, I need to plot the dependent variable as a function of a subset of the independent variables in the experiment, like this: 
Here I plotted the mean shift, and std dev of the shift, by observer (columns), eccentricity, and direction of motion (colors). This requires collapsing across the other variables that you can’t see here. I think this involves a “PivotTable” in Excel terminology. For python, I wrote a function where I pass a recarray and the names of the variables (datafile columns) that I want to collapse by, and it passes back multi-dimensional arrays providing the mean, standard deviation, and number of data points for every combination of the variables.
collapseBy(data,DV,*factors)
I hope someone finds this code as useful as I do; it seems something like this should be put into SciPy.
Update: Josef schooled me (in a helpful way!) by writing new code for this functionality in three different ways, with each way much cleaner than mine.
The binding problem: A new encyclopedia entry December 12, 2008
Posted by alexholcombe in neuroscience, open access, psychology.add a comment
The conventional encyclopedia: old and unimproved!
Here is a preprint of my entry for the The Sage Encyclopedia of Perception with headword “The Binding Problem”. The hardcopy version of the encyclopedia will be a massive 1100-page tome with hundreds of contributors. Sadly, this is very much a conventional, 20th-century era encyclopedia—the style guide prohibited me from referencing the original papers I was referring to. The only way I could point the reader towards the original research was to say things like “Smith has shown …” or “In 1980, Treisman proposed …”.
Perhaps this encyclopedia style made sense back before the internet, when limited space might prevent actual referencing, and anyway the average reader had no ability to access original papers. So Britannica had an excuse to adopt their lofty tone which almost gives the impression they created all the knowledge themselves. But writing in 2008, to me it felt unconscionable to describe all these discoveries in an academic publication without giving credit where it’s due. So in the preprint I’ve posted, I’ve added all the references in as if it were a modern academic publication. And I’m posting this now, a good 10 months or more before the encyclopedia is actually published. I expect that ten months from now, the entry may be embarrassingly out of date.
Sage has dozens of encyclopedias like this in the works, all of which presumably have these major shortcomings of long publication lag and impoverished referencing, but apparently they still think they will make money. They are charging $450 for the Perception volume my entry will appear in!
To me, the open-access approach exemplified by Scholarpedia is the only way to go, because:
- It is free. So the readership is tens or hundreds of times larger.
- It is published nearly instantly, so it is not already out of date the day it appears.
- It does not kill trees.
- It is easily updated.
- Its authors have no incentive to undermine it, as I have done to the Sage Encyclopedia so that my work can be seen by those who are unwilling to pay $450 for it! By the way, posting a preprint as I have done is almost always legal.
Finally, online publishing projects like Scholarpedia do not have arbitrary word limits, which would have allowed me to avoid apologizing to those whose work on binding I left out because of the arbitrary limit in this encyclopedia. But there is evidence that overall, I omitted things fairly: I’m upset about me leaving so much of me out (e.g. Holcombe 2008; Holcombe & Cavanagh 2008; Holcombe & Judson 2007; Holcombe & Cavanagh 2001).
Alex O. Holcombe (2009). The binding problem The Sage Encyclopedia of Perception
don’t know much ’bout neural networks? An interactive tutorial December 10, 2008
Posted by alexholcombe in neuroscience, psychology.add a comment
I’m releasing an interactive tutorial suitable for either individual learning or in the context of a class wherein each student, or pair of students, has a computer. I used it for my third-year psychology university students. Before beginning the 100-minute class, most had little idea how connectionist networks could store memories or compute visually guided action. By the end, they were happily rewiring their networks to encode new memories or accomplish new actions. It’s all made possible by the free, beautiful, and easy-to-use Java-based neural network simulator SimBrain that I blogged earlier.
SimBrain comes with a large number of tutorials, but these are designed for an entire course on neural networks. I needed one that could fit a single 90-minute class, so I created my own, which are basically just modifications of the great content they already released. I’ve posted the network files to be used with SimBrain, plus the instructions, here. The instructions are broken up into several separate webpages. Each one ends with an exercise for the students to try. The subsequent webpage discusses the answer to the exercise a bit. During the actual class we teach, we prevent the students from proceeding immediately to the subsequent webpage by password-protecting the pages, and giving them the password after they’ve made an effort. I might be able to provide access to that version upon request. Let me know of any problems, and whether you find the tutorial useful.