## Data analysis with Python, SciPy and R

I’ve transitioned to all open-source software for my science. The Python language and its libraries VisionEgg and Psychopy are more than sufficient to code my perception experiments. For data analysis, I’ve gotten pretty far with the SciPy library for Python, which has probability distributions, function minimization, Fourier transforms, etc. The Matplotlib library makes it easy to make plots in a way familiar for old MATLAB users like me. Unfortunately however, it appears that nothing’s available for taking a load of data, data that’s formatted with many entries (e.g. rows) each of which has several values associated with it (one for each independent and dependent variable of the experiment), and

- summarizing (calculating mean etc.) of the dependent variable contingent on various independent variables (like an Excel pivot-table)
- performing the all-important (in experimental psychology and neuroscience) multiple linear regression and ANOVAs.

I wrote something for #1, but #2 is too much for me. I have had to start using R.

R appears to be the best open-source data analysis and statistics program, and has an incredible variety of packages for all sorts of analyses, often programmed as soon as a statistics professor dreams it up. For example, there is a package for the directional statistics I need, which I don’t think you can find in SPSS or SAS. The R syntax *is* really clunky, as opposed to the beauty that is Python, which is irritating but doesn’t actually slow one down much.

Fortunately RPy2 allows one to call R functions from Python. It’s a fairly basic interface and took me awhile to understand how to pass data between Python and R, but it works well. I’m very grateful to the developers, who deserve more help.

The documentation of all these Python libraries leaves a lot to be desired. The example code snippets for SciPy are still too sparse, and more are sorely needed to help users quickly do specific things without having to spend an hour figuring out exactly what some poorly-documented function’s parameters do. The same goes for RPy2. I hope to help out when I have time.

Update: some RPy help

Update: StackOverflow has some helpful answers for questions regarding how to use RPy2

[...] trackback Earlier I wrote about the open-source free tools I use to plot and analyze my data—Python and R. One of the most time-consuming and fiddly parts of making graphs for our papers is the need [...]

ceptionalMay 18, 2009 at 3:58 am

R is very thorough for statistics. However if there is something that Scipy can do, I jump to it. Writing scripts and trying to debug code in R is such a pain… Such a pain! However, things like ANCOVA are still impossible in Python.

ShaxMay 27, 2010 at 5:33 am

@Shax: my feelings exactly!

alexholcombeMay 27, 2010 at 5:51 am