I have a data.table
structure like so (except mine is really huge):
dt <- data.table(x=1:5, y=3:7, key='x')
I want to look up rows in that structure by another variable whose name is x
(notice - the same as the name of the key of dt
):
x <- 3:4
dt2 <- dt[ J(x) ]
This doesn't work, because the lookup sees the column name first, and the local variable is obscured:
dt2
# x y
# 1: 1 3
# 2: 2 4
# 3: 3 5
# 4: 4 6
# 5: 5 7
I thought about the with
argument for [.data.table
, but that only applies to the j
argument, not the i
argument.
Is there something similar for the i
argument?
If not, such a thing would be handy whenever I'm using a local variable and I don't know the complete list of column names in dt
, to avoid conflicts.
There is an item in the NEWS for 1.8.2 that suggests a ..()
syntax will be added at some point, allowing this
New DT[.(...)] syntax (in the style of package plyr) is identical to
DT[list(...)], DT[J(...)] and DT[data.table(...)]. We plan to add ..(), too, so
that .() and ..() are analogous to the file system's ./ and ../; i.e., .()
evaluates within the frame of DT and ..() in the parent scope.
In the mean time, you can get
from the appropriate environment
dt[J(get('x', envir = parent.frame(3)))]
## x y
## 1: 3 5
## 2: 4 6
or you could eval
the whole call to list(x)
or J(x)
dt[eval(list(x))]
dt[eval(J(x))]
dt[eval(.(x))]
New answer, now that I think I understand what was requested:
> X <- data.table(x=x)
> merge(dt, X)
x y
1: 3 6
2: 4 7
Adding some benchmarking results, by request.
dt
is a 53080731 x 5 data.table
object, keyed by a numeric column with around 100 unique values, fairly evenly distributed. x
is a vector containing 5 of those values.
library(microbenchmark)
> mb <- microbenchmark(
+ dt[eval(J(x))],
+ merge(dt, data.table(x)),
+ times=10
+ )
> mb
Unit: milliseconds
expr min lq median uq max neval
dt[eval(J(x))] 127.324 127.549 133.5305 154.410 159.433 10
merge(dt, data.table(x)) 5028.349 5083.792 5129.6590 5170.451 5250.255 10
@Tyler, if you can assist me with how to use qdap::lookup()
for this case with multiple columns, I can add that too.