I have a data frame with multiple date ranges (45 to be exact):
Range Start End
1 2014-01-01 2014-02-30
2 2015-01-10 2015-03-30
3 2016-04-20 2016-10-12
... ... ...
They will never overlap
I also have a data frame with various event dates (200K+):
Event Date
1 2014-01-02
2 2014-03-20
3 2015-04-01
4 2016-08-18
... ...
I want to test if these dates fall within any of these ranges:
Event Date InRange
1 2014-01-02 TRUE
2 2014-03-20 FALSE
3 2015-04-01 FALSE
4 2016-08-18 TRUE
...
What is the best way to perform this test? I have looked at lubridate's between and interval functions as well as various Stackoverflow questions, but cannot find a good solution.
You can create a vector of your date range from the first data frame, then use
%in%
operator to check if each date of your events is in this date range. Assuming your first data frame isdateRange
, and secondevents
, putting the above logic in one line would be:Where we used the
Map
to create the date range vector.Map
combined with:
operator create a list of date range from theStart
to theEnd
. Somewhere close tolist(2014-01-01 : 2014-02-30, 2015-01-10 : 2015-03-30, 2016-04-20 : 2016-10-12 ...)
(symbolically, not valid), with theunlist
, we flatten it as a vector of date range which could then be used with%in%
conveniently.Having ordered, non-overlapping intervals in your first "data.frame", you could test -for each event date- if it is above a
$Start
and its respective$End
. UsingfindInterval
to reduce relational comparisons and memory needed.With data (modified "2014-02-30"):
Write your own
function
to check if a list of dates are in any of a number of intervals.Data:
Test using string
s
.