What is your preferred style for naming variables

2019-01-20 21:47发布

Which conventions for naming variables and functions do you favor in R code?

As far as I can tell, there are several different conventions, all of which coexist in cacophonous harmony:

1. Use of period separator, e.g.

  stock.prices <- c(12.01, 10.12)
  col.names    <- c('symbol','price')

Pros: Has historical precedence in the R community, prevalent throughout the R core, and recommended by Google's R Style Guide.

Cons: Rife with object-oriented connotations, and confusing to R newbies

2. Use of underscores

  stock_prices <- c(12.01, 10.12)
  col_names    <- c('symbol','price')

Pros: A common convention in many programming langs; favored by Hadley Wickham's Style Guide, and used in ggplot2 and plyr packages.

Cons: Not historically used by R programmers; is annoyingly mapped to '<-' operator in Emacs-Speaks-Statistics (alterable with 'ess-toggle-underscore').

3. Use of mixed capitalization (camelCase)

  stockPrices <- c(12.01, 10.12)
  colNames    <- c('symbol','price')

Pros: Appears to have wide adoption in several language communities.

Cons: Has recent precedent, but not historically used (in either R base or its documentation).

Finally, as if it weren't confusing enough, I ought to point out that the Google Style Guide argues for dot notation for variables, but mixed capitalization for functions.

The lack of consistent style across R packages is problematic on several levels. From a developer standpoint, it makes maintaining and extending other's code difficult (esp. where its style is inconsistent with your own). From a R user standpoint, the inconsistent syntax steepens R's learning curve, by multiplying the ways a concept might be expressed (e.g. is that date casting function asDate(), as.date(), or as_date()? No, it's as.Date()).

9条回答
啃猪蹄的小仙女
2楼-- · 2019-01-20 22:02

As others have mentioned, underscores will screw up a lot of folks. No, it's not verboten but it isn't particularly common either.

Using dots as a separator gets a little hairy with S3 classes and the like.

In my experience, it seems like a lot of the high muckity mucks of R prefer the use of camelCase, with some dot usage and a smattering of underscores.

查看更多
别忘想泡老子
3楼-- · 2019-01-20 22:03

Underscores all the way! Contrary to popular opinion, there are a number of functions in base R that use underscores. Run grep("^[^\\.]*$", apropos("_"), value = T) to see them all.

I use the official Hadley style of coding ;)

查看更多
smile是对你的礼貌
4楼-- · 2019-01-20 22:06

I have a preference for mixedCapitals.

But I often use periods to indicate what the variable type is:

mixedCapitals.mat is a matrix. mixedCapitals.lm is a linear model. mixedCapitals.lst is a list object.

and so on.

查看更多
该账号已被封号
5楼-- · 2019-01-20 22:07

As I point out here:

How does the verbosity of identifiers affect the performance of a programmer?

it's worth bearing in mind how understandable your variable names are to your co-workers/users if they are non-native speakers...

For that reason I'd say underscores and periods are better than capitalisation, but as you point out consistency is essential within your script.

查看更多
疯言疯语
6楼-- · 2019-01-20 22:10

This comes down to personal preference, but I follow the google style guide because it's consistent with the style of the core team. I have yet to see an underscore in a variable in base R.

查看更多
干净又极端
7楼-- · 2019-01-20 22:11

I like camelCase when the camel actually provides something meaningful -- like the datatype.

dfProfitLoss, where df = dataframe

or

vdfMergedFiles(), where the function takes in a vector and spits out a dataframe

While I think _ really adds to the readability, there just seems to be too many issues with using .-_ or other characters in names. Especially if you work across several languages.

查看更多
登录 后发表回答