tl;dr R is a powerful, effective, diverse, well-supported, de-facto-standard, free nightmare. As far as statistical languages go, this is outstandingly good, so I may as well learn it.
Pros and cons
- combines unparalleled breadth and community, at least as pertains to statisticians, data miners, machine learners and other such assorted folk as I call my colleagues. To get some sense of this thriving scene, check out R-bloggers. That community alone is enough to sell R, whatever you think of the language (cf “Your community is your best asset”) And believe me, I have reservations about everything else.
- amazing, statistically-useful plotting (cf, e.g., the awful battle to get error bars in python’s mayavi)
- online web-app visualisation: shiny
- integration into literate coding and reproducible research through knitr - see scientific writing workflow.
- Poetically, R has random scope amongst other parser and syntax weirdness.
- Call-by-value semantics (in a “big-data” processing language?)
- …ameliorated not even by array views,
- …exacerbated by bloaty design
- Object model tacked on after the fact… in fact, several object models, which is fine? I guess? maybe, but…
- …if the object model stuff is multi-standard compatibility disaster, I’d like the trade-off to be speed, or functional design features, or some other such modern convenience. Nah.
- One of the worst names to google for ever (cf Processing, Pure)
Most lucid explanation of everything is Hadley Wickham’s Advanced R bookk which is free and only “Advanced” in the sense that your knowledge of R will be advanced after reading it, not in the sense of being forbiddingly complicated for beginners.
Functional prog hacks
- purr “A FP package for R in the spirit of underscore.js”
- magrittr brings a compose (“pipe”) operator to R:
useful functions: semi_join etc plyr and dplyr are the essential package.
To subset a list based object:
to subset and optionally downcast the same:
to subset a matrix-based object:
x[1, , drop=FALSE]
to subset and optionally downcast the same:
How to pass sparse matrices between R and Python
Counter-intuitively, this FS-backed method was a couple of orders of magnitude faster than rpy2 last time I tried to pass more than a few MB of data.
Apparently you are supposed to use feather for this these days, although there is no documentation.
Upgrading R breaks the installed packages
update: no longer broken by default in latest R!
This is the fix:
Bioconductor’s horrifyingly pwnable install
What, you’d like to install some biostatistics software on your campus supercomputing cluster? Easy! Simply download and run this unverifiable obligatedly unencrypted unsigned script from a webserver of unknown provenance!
It is probably usually not often script kiddies spoofing you so as to to trojan your campus computing cluster to steal CPU cycles. After all, who would do that?
On an unrelated note, I am looking for investors in a distributed bitcoin mining operation. Contact me privately.
Easy project reload
Make a folder called MyCode with a DESCRIPTION file. Make a subfolder called R. Put R code in .R files in there. Edit, load_all(“MyCode”), use the functions.
Hadley Wickham pro-style
Install his devtools.
Read how he makes an R-package.
Here’s an intro that explains how to use the OO facilities of R - although I recommend going for a functional style to avoid pain.
There are step debuggers and other such modern conveniences
inspecting frames post hoc: recover In fact, pro-tip, you can invoke it in 3rd party code gracefully:
options(error = utils::recover)
Interactive debugger: browser
Graphical interactive optionally-web-based debugger available in RStudio and if it had any more buzzwords in it would socially tag your instagram and upload in to the NSA’s Internet Of Things to be 3D printed.
easy command-line invocation: Rio —- Loads CSV from stdin into R as a data.frame, executes given commands, and gets the output as CSV or PNG on stdout
R for Pythonistas
Many things about R are surprising to me, coming as I do most recently from Python. I’m documenting my perpetual surprise here, in order that it may save someone else the inconvenience of going to all that trouble to be personally surprised.
Importing an R package, unlike importing a python module, brings in random cruft that may have little to do with the names of the thing you just imported. That is, IMO, poor planning, although history indicates that most language designers don’t agree with me on that:
> npreg Error: object 'npreg' not found > library("np") Nonparametric Kernel Methods for Mixed Datatypes (version 0.40-4) > npreg function (bws, ...) #etc
Further, Data structures in R can do, and are intended to, provide first class scopes for looking up of names. You are, as apt of your explorations into data to bring the names of columns in a data set into scope just as much as the names of functions in a library. This is kind of useful, although the scoping proceedings do make my eyes water when this intersects with function definition.
Formulas are cool and ugly, like Adult Swim, and intimately bound up in the prior point.
assignment to function calls
I need to learn the R terminology to describe this.
R fosters a style of programming where attributes and metadata of data objects are set by using accessor functions, e.g. in matrix column naming:
> m=matrix(0, nrow=2,ncol=2) > m [,1] [,2] [1,] 0 0 [2,] 0 0 > colnames(m) NULL > colnames(m)=c('a','b') > colnames(m)  "a" "b" > m a b [1,] 0 0 [2,] 0 0
If you want to know by observing its effects whether an apparent function returns some massaged product of is argument, or whether it decorates the argument, well, check the manual. As a rule, the accessor functions operate on one object and return null, although so can, e.g., plotting functions.
No scalar types…
A float is a float vector of size 1:
> 5  5
…yet verbose vector literal syntax
You makes vectors by using a call to a function called c. Witness:
> c('a', 'b', 'c', 'd')  "a" "b" "c" "d"
If you type a literal vector in though, it will throw an error:
> 'a', 'b', 'c', 'd' Error: unexpected ',' in "'a',"
I’m sure there are Reasons for this; it’s just that they are reasons that I don’t care about.