This question already has an answer here:
I want to get a list of values that fall in between multiple ranges.
library(data.table)
values <- data.table(value = c(1:100))
range <- data.table(start = c(6, 29, 87), end = c(10, 35, 92))
I need the results to include only the values that fall in between those ranges:
results <- c(6, 7, 8, 9, 10, 29, 30, 31, 32, 33, 34, 35, 87, 88, 89, 90, 91, 92)
I am currently doing this with a for loop,
results <- data.table(NULL)
for (i in 1:NROW(range){
results <- rbind(results,
data.table(result = values[value >= range[i, start] &
value <= range[i, end], value]))}
however the actual dataset is quite large and I am looking for a more efficient way.
Any suggestions are appreciated! Thank you!
Using the non-equi join possibility of
data.table
:which gives:
Or as per the suggestion of @Henrik:
values[value %inrange% range]
. This works also very well on data.table's with multiple columns:If you have the latest CRAN version of data.table you can use non-equi joins. For example, you can create an index which you can then use to subset your original data:
Here is one method using
lapply
and%between%
This method loops through the ranges data.table and subsets values in each iteration according to the variable in ranges.
lapply
returns a list, whichrbindlist
constructs into a data.table. If you want a vector, replacerbindlist
withunlist
.benchmarks
Just to check the speeds of each suggestion on the given data, I ran a quick comparison
This returned
As might be expected, my looping solution is quite a bit slower than the others. However, the clear winner is
%inrange%
, which is essentially a vectorized extension of%between%
.