Which conventions for naming variables and functions do you favor in R code?
As far as I can tell, there are several different conventions, all of which coexist in cacophonous harmony:
1. Use of period separator, e.g.
stock.prices <- c(12.01, 10.12)
col.names <- c('symbol','price')
Pros: Has historical precedence in the R community, prevalent throughout the R core, and recommended by Google's R Style Guide.
Cons: Rife with object-oriented connotations, and confusing to R newbies
2. Use of underscores
stock_prices <- c(12.01, 10.12)
col_names <- c('symbol','price')
Pros: A common convention in many programming langs; favored by Hadley Wickham's Style Guide, and used in ggplot2 and plyr packages.
Cons: Not historically used by R programmers; is annoyingly mapped to '<-' operator in Emacs-Speaks-Statistics (alterable with 'ess-toggle-underscore').
3. Use of mixed capitalization (camelCase)
stockPrices <- c(12.01, 10.12)
colNames <- c('symbol','price')
Pros: Appears to have wide adoption in several language communities.
Cons: Has recent precedent, but not historically used (in either R base or its documentation).
Finally, as if it weren't confusing enough, I ought to point out that the Google Style Guide argues for dot notation for variables, but mixed capitalization for functions.
The lack of consistent style across R packages is problematic on several levels. From a developer standpoint, it makes maintaining and extending other's code difficult (esp. where its style is inconsistent with your own). From a R user standpoint, the inconsistent syntax steepens R's learning curve, by multiplying the ways a concept might be expressed (e.g. is that date casting function asDate(), as.date(), or as_date()? No, it's as.Date()).
As others have mentioned, underscores will screw up a lot of folks. No, it's not verboten but it isn't particularly common either.
Using dots as a separator gets a little hairy with S3 classes and the like.
In my experience, it seems like a lot of the high muckity mucks of R prefer the use of camelCase, with some dot usage and a smattering of underscores.
Underscores all the way! Contrary to popular opinion, there are a number of functions in base R that use underscores. Run
grep("^[^\\.]*$", apropos("_"), value = T)
to see them all.I use the official Hadley style of coding ;)
I have a preference for mixedCapitals.
But I often use periods to indicate what the variable type is:
mixedCapitals.mat is a matrix. mixedCapitals.lm is a linear model. mixedCapitals.lst is a list object.
and so on.
As I point out here:
How does the verbosity of identifiers affect the performance of a programmer?
it's worth bearing in mind how understandable your variable names are to your co-workers/users if they are non-native speakers...
For that reason I'd say underscores and periods are better than capitalisation, but as you point out consistency is essential within your script.
This comes down to personal preference, but I follow the google style guide because it's consistent with the style of the core team. I have yet to see an underscore in a variable in base R.
I like camelCase when the camel actually provides something meaningful -- like the datatype.
dfProfitLoss, where df = dataframe
or
vdfMergedFiles(), where the function takes in a vector and spits out a dataframe
While I think _ really adds to the readability, there just seems to be too many issues with using .-_ or other characters in names. Especially if you work across several languages.