Data analysis with Python, SciPy and R
I’ve transitioned to all open-source software for my science. The Python language and its libraries VisionEgg and Psychopy are more than sufficient to code my perception experiments. For data analysis, I’ve gotten pretty far with the SciPy library for Python, which has probability distributions, function minimization, Fourier transforms, etc. The Matplotlib library makes it easy to make plots in a way familiar for old MATLAB users like me. Unfortunately however, it appears that nothing’s available for taking a load of data, data that’s formatted with many entries (e.g. rows) each of which has several values associated with it (one for each independent and dependent variable of the experiment), and
- summarizing (calculating mean etc.) of the dependent variable contingent on various independent variables (like an Excel pivot-table)
- performing the all-important (in experimental psychology and neuroscience) multiple linear regression and ANOVAs.
I wrote something for #1, but #2 is too much for me. I have had to start using R.
R appears to be the best open-source data analysis and statistics program, and has an incredible variety of packages for all sorts of analyses, often programmed as soon as a statistics professor dreams it up. For example, there is a package for the directional statistics I need, which I don’t think you can find in SPSS or SAS. The R syntax is really clunky, as opposed to the beauty that is Python, which is irritating but doesn’t actually slow one down much.
Fortunately RPy2 allows one to call R functions from Python. It’s a fairly basic interface and took me awhile to understand how to pass data between Python and R, but it works well. I’m very grateful to the developers, who deserve more help.
The documentation of all these Python libraries leaves a lot to be desired. The example code snippets for SciPy are still too sparse, and more are sorely needed to help users quickly do specific things without having to spend an hour figuring out exactly what some poorly-documented function’s parameters do. The same goes for RPy2. I hope to help out when I have time.
Update: some RPy help
Update: StackOverflow has some helpful answers for questions regarding how to use RPy2