Determine which column name is causing 'undefi

2019-05-07 05:07发布

问题:

I'm trying to subset a large data frame from a very large data frame, using

data.new <- subset(data, select = vector)

where vector is a character string containing the column names I'm trying to isolate. When I do this I get

Error in `[.data.frame`(x, r, vars, drop = drop) : 
  undefined columns selected

Is there a way to identify which specific column name in the vector is undefined? Through trial and error I've narrowed it down to about 400, but that still doesn't help.

回答1:

Find the elements of your vector that are not %in% the names() of your data frame.

Working example:

dd <- data.frame(a=1,b=2)
subset(dd,select=c("a"))
##   a
## 1 1

Now try something that doesn't work:

v <- c("a","d")
subset(dd,select=v)
## Error in `[.data.frame`(x, r, vars, drop = drop) : 
##    undefined columns selected

v[!v %in% names(dd)]
## [1] "d"

Or

setdiff(v,names(dd))
## [1] "d"

The last few lines of the example code in ?match show a similar case.



标签: r subset