Selecting timestamps within range in R

2019-03-04 09:33发布

I have two data frames in R.

df1 looks like this:

id       time
1        2018-08-28 11:22:40
2        2018-08-28 11:35:10
3        2018-08-28 11:50:00
4        2018-08-28 11:55:30

df2 looks like this:

start_time             end_time
2018-08-28 11:22:00    2018-08-28 11:22:50
2018-08-28 11:30:30    2018-08-28 11:34:10
2018-08-28 11:49:00    2018-08-28 11:52:20
2018-08-28 11:57:20    2018-08-28 11:59:40

I'm trying to select the rows from the df1 that fall between any of the start_time and end_time pairs in df2. In the example above that would leave me with:

id       time
1        2018-08-28 11:22:40
3        2018-08-28 11:50:00

This problem is similar to that found here but in R instead of SQL. How do I achieve this?

标签: r time
1条回答
放荡不羁爱自由
2楼-- · 2019-03-04 10:10

Here is an option using fuzzyjoin

library(fuzzyjoin)
library(tidyverse)
fuzzy_left_join(
    df1 %>% mutate(time = as.POSIXct(time)),
    df2 %>% mutate(
        start_time = as.POSIXct(start_time),
        end_time = as.POSIXct(end_time)),
    by = c("time" = "start_time", "time" = "end_time"),
    match_fun = list(`>=`, `<=`)) %>%
    filter(!is.na(start_time)) %>%
    select(id, time)
#  id                time
#1  1 2018-08-28 11:22:40
#2  3 2018-08-28 11:50:00

Explanation: Interval join df1 and df2 (where time >= start_time & time <= end_time), then only select rows with no NAs in start_time (as those are the entries that lie within the start_time-end_time interval).


Sample data

df1 <- read.table(text =
    "id       time
1        '2018-08-28 11:22:40'
2        '2018-08-28 11:35:10'
3        '2018-08-28 11:50:00'
4        '2018-08-28 11:55:30'", header = T)

df2 <- read.table(text =
    "start_time             end_time
'2018-08-28 11:22:00'    '2018-08-28 11:22:50'
'2018-08-28 11:30:30'    '2018-08-28 11:34:10'
'2018-08-28 11:49:00'    '2018-08-28 11:52:20'
'2018-08-28 11:57:20'    '2018-08-28 11:59:40'", header = T)
查看更多
登录 后发表回答