I have a data frame with multiple date ranges (45 to be exact):
Range Start End
1 2014-01-01 2014-02-30
2 2015-01-10 2015-03-30
3 2016-04-20 2016-10-12
... ... ...
They will never overlap
I also have a data frame with various event dates (200K+):
Event Date
1 2014-01-02
2 2014-03-20
3 2015-04-01
4 2016-08-18
... ...
I want to test if these dates fall within any of these ranges:
Event Date InRange
1 2014-01-02 TRUE
2 2014-03-20 FALSE
3 2015-04-01 FALSE
4 2016-08-18 TRUE
...
What is the best way to perform this test? I have looked at lubridate's between and interval functions as well as various Stackoverflow questions, but cannot find a good solution.
You can create a vector of your date range from the first data frame, then use %in%
operator to check if each date of your events is in this date range. Assuming your first data frame is dateRange
, and second events
, putting the above logic in one line would be:
events$InRange <- events$Date %in% unlist(Map(`:`, dateRange$Start, dateRange$End))
events
Event Date InRange
1 1 2014-01-02 TRUE
2 2 2014-03-20 FALSE
3 3 2015-04-01 FALSE
4 4 2016-08-18 TRUE
Where we used the Map
to create the date range vector. Map
combined with :
operator create a list of date range from the Start
to the End
. Somewhere close to list(2014-01-01 : 2014-02-30, 2015-01-10 : 2015-03-30, 2016-04-20 : 2016-10-12 ...)
(symbolically, not valid), with the unlist
, we flatten it as a vector of date range which could then be used with %in%
conveniently.
Having ordered, non-overlapping intervals in your first "data.frame", you could test -for each event date- if it is above a $Start
and its respective $End
. Using findInterval
to reduce relational comparisons and memory needed.
findInterval(events$Date, ranges$Start) > findInterval(events$Date, ranges$End)
#[1] TRUE FALSE FALSE TRUE
With data (modified "2014-02-30"):
ranges = structure(list(Range = 1:3, Start = structure(c(16071, 16445,
16911), class = "Date"), End = structure(c(16129, 16524, 17086
), class = "Date")), .Names = c("Range", "Start", "End"), row.names = c(NA,
-3L), class = "data.frame")
events = structure(list(Event = 1:4, Date = structure(c(16072, 16149,
16526, 17031), class = "Date")), .Names = c("Event", "Date"), row.names = c(NA,
-4L), class = "data.frame")
Write your own function
to check if a list of dates are in any of a number of intervals.
date.in <- function(x){
m <- NULL
for (i in 1:NROW(df)){m <- c(m, ifelse(x>=df[i,1] & x<=df[i,2], TRUE, FALSE))}
any(m)}
Data:
df <- data.frame(start=c("2014-01-01", "2015-01-10", "2016-04-20"),
end=c("2014-02-30", "2015-03-30", "2016-10-12"))
df[] <- lapply(df, as.character)
s <- c("2014-01-02", "2014-03-20", "2015-04-01", "2016-08-18")
Test using string s
.
as.character(lapply(s, date.in))#TRUE FALSE FALSE TRUE