I would like to subset my data based on multiple inequality conditions using the data.table package. The examples in the data.table manual show how to do this with character variables, but not with numeric inequalities. I also see how to do this using the subset function. But I really would like to take advantage of the data.table binary search speed. Below is an example of what I am trying to do.
library(data.table)
data <- data.table(X=seq(-5,5,1), Y=seq(-5,5,1), Z=seq(-5,5,1))
data
setkey(data, X, Y, Z)
#the data.frame way
data[X > 0 & Y > 0 & Z > 0]
#the data.table way (does not work as I expected)
data[J(>0, >0, >0)]
The solution is quite fast and straightforward using the package dplyr.
dplyr is showing to be one of the easiest and fastest packages for managing data frames. Check this great tutorial here: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
The RStudio team have alsoe produced a nice Cheat Sheet, here: http://www.rstudio.com/resources/cheatsheets/
I run some benchmarks
Simple clone example data to 5.5M rows.
Performance of that task seems to be hard to improve. Often it depends on the data.