Keyed lookup on data.table without 'with'

2019-01-09 16:18发布

问题:

I have a data.table structure like so (except mine is really huge):

dt <- data.table(x=1:5, y=3:7, key='x')

I want to look up rows in that structure by another variable whose name is x (notice - the same as the name of the key of dt):

x <- 3:4
dt2 <- dt[ J(x) ]

This doesn't work, because the lookup sees the column name first, and the local variable is obscured:

dt2
#    x y
# 1: 1 3
# 2: 2 4
# 3: 3 5
# 4: 4 6
# 5: 5 7

I thought about the with argument for [.data.table, but that only applies to the j argument, not the i argument.

Is there something similar for the i argument?

If not, such a thing would be handy whenever I'm using a local variable and I don't know the complete list of column names in dt, to avoid conflicts.

回答1:

There is an item in the NEWS for 1.8.2 that suggests a ..() syntax will be added at some point, allowing this

New DT[.(...)] syntax (in the style of package plyr) is identical to DT[list(...)], DT[J(...)] and DT[data.table(...)]. We plan to add ..(), too, so that .() and ..() are analogous to the file system's ./ and ../; i.e., .() evaluates within the frame of DT and ..() in the parent scope.

In the mean time, you can get from the appropriate environment

dt[J(get('x', envir = parent.frame(3)))]
##    x y
## 1: 3 5
## 2: 4 6

or you could eval the whole call to list(x) or J(x)

dt[eval(list(x))]
dt[eval(J(x))]
dt[eval(.(x))]


回答2:

New answer, now that I think I understand what was requested:

> X <- data.table(x=x)
> merge(dt, X)
   x y
1: 3 6
2: 4 7


回答3:

Adding some benchmarking results, by request.

dt is a 53080731 x 5 data.table object, keyed by a numeric column with around 100 unique values, fairly evenly distributed. x is a vector containing 5 of those values.

library(microbenchmark)
> mb <- microbenchmark(
+     dt[eval(J(x))],
+     merge(dt, data.table(x)),
+     times=10
+ )
> mb
Unit: milliseconds
                     expr      min       lq    median       uq      max neval
           dt[eval(J(x))]  127.324  127.549  133.5305  154.410  159.433    10
 merge(dt, data.table(x)) 5028.349 5083.792 5129.6590 5170.451 5250.255    10

@Tyler, if you can assist me with how to use qdap::lookup() for this case with multiple columns, I can add that too.