I have a vector of numeric elements, and a dataframe with two columns that define the start and end points of intervals. Each row in the dataframe is one interval. I want to find out which interval each element in the vector belongs to.
Here's some example data:
# Find which interval that each element of the vector belongs in
library(tidyverse)
elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)
intervals <- frame_data(~phase, ~start, ~end,
"a", 0, 0.5,
"b", 1, 1.9,
"c", 2, 2.5)
The same example data for those who object to the tidyverse:
elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)
intervals <- structure(list(phase = c("a", "b", "c"),
start = c(0, 1, 2),
end = c(0.5, 1.9, 2.5)),
.Names = c("phase", "start", "end"),
row.names = c(NA, -3L),
class = "data.frame")
Here's one way to do it:
library(intrval)
phases_for_elements <-
map(elements, ~.x %[]% data.frame(intervals[, c('start', 'end')])) %>%
map(., ~unlist(intervals[.x, 'phase']))
Here's the output:
[[1]]
phase
"a"
[[2]]
phase
"a"
[[3]]
phase
"a"
[[4]]
character(0)
[[5]]
phase
"b"
[[6]]
phase
"b"
[[7]]
phase
"c"
But I'm looking for a simpler method with less typing. I've seen findInterval
in related questions, but I'm not sure how I can use it in this situation.
For completion sake, here is another way, using the
intervals
package:David Arenburg's mention of non-equi joins was very helpful for understanding what general kind of problem this is (thanks!). I can see now that it's not implemented for dplyr. Thanks to this answer, I see that there is a fuzzyjoin package that can do it in the same idiom. But it's barely any simpler than my
map
solution above (though more readable, in my view), and doesn't hold a candle to thelatemail'scut
answer for brevity.For my example above, the fuzzyjoin solution would be
Which gives:
cut
is possibly useful here.Here's a possible solution using the new "non-equi" joins in
data.table
(v>=1.9.8). While I doubt you'll like the syntax, it should be very efficient soluion.Also, regarding
findInterval
, this function assumes continuity in your intervals, while this isn't the case here, so I doubt there is a straightforward solution using it.Regarding the above code, I find it pretty self-explanatory: Join
intervals
andelements
by the condition specified in theon
operator. That's pretty much it.There is a certain caveat here though,
start
,end
andelements
should be all of the same type, so if one of them isinteger
, it should be converted tonumeric
first.Just
lapply
works:or in
purrr
, if you purrrfurrr,Here is kind of a "one-liner" which (mis-)uses
foverlaps
from thedata.table
package but David's non-equi join is still more concise: