Indexing a dataframe with $ inside a function?

2020-05-03 12:44发布

问题:

Many R textbooks encourage the use of $ to retrieve variables (columns) from data.frames^. However, I found that this does not work inside a function, and I can't figure out why.

data(BOD)
print(BOD)

# These work. 
BOD$'demand'
BOD[ ,'demand']

# This works.
myFunc1 <- function(x, y){
  z <- x[ , y]
  return(z)
}
out <- myFunc(BOD, 'demand')

# This doesn't work.
myFunc2 <- function(x, y){
  z <- x$y
  return(z)
}
out <- myFunc2(BOD, 'demand')

I notice that in the R Language Definition it says:

The form using $ applies to recursive objects such as lists and pairlists. It allows only a literal >character string or a symbol as the index. That is, the index is not computable: for cases where >you need to evaluate an expression to find the index, use x[[expr]]. When $ is applied to a >non-recursive object the result used to be always NULL: as from R 2.6.0 this is an error.

Is myFunc2 above an example where $ is not being supplied a literal character string?

^ Zuur 2009 'Beginner's Guide to R' p 61

^ Spector 2008 'Data Manipulation with R' p 26, 64, 69

回答1:

You can use also [[ instead of $

myFunc2 <- function(x, y){
+     z <- x[[y]]
+     return(z)
+ }
> myFunc2(BOD, 'demand')
[1]  8.3 10.3 19.0 16.0 15.6 19.8


回答2:

Personally I think the dollar operator $ is handy and useful from the R console. It permits completion and partial namings feature. $ is useful for an interactive mode . But if you want to use it within your function you should create a call using do.call like this :

myFunc2 <- function(x, y){
  z <- do.call('$',list(x,y))
  return(z)
}
myFunc2(BOD,'demand')
[1]  8.3 10.3 19.0 16.0 15.6 19.8

But here is simpler to use [ as you have mentioned:

myFunc2 <- function(x, y){
  z <-     x[,y]
  return(z)
}


回答3:

If you want to perfectly mimic the $ operator you can use [[ and set the parameter exact to FALSE to allow for partial matching.

BOD <- data.frame(demand = 1:10)

myFunc2 <- function(x, y) x[[y, exact = FALSE]]

BOD$dem
## [1]  1  2  3  4  5  6  7  8  9 10

BOD[["dem"]]
## NULL

myFunc2(BOD, "dem")
## [1]  1  2  3  4  5  6  7  8  9 10