Time difference between rows in R dplyr, different

2020-04-09 04:02发布

Here is my example. I am reading the following file: sample_data

library(dplyr)

txt <- c('"",  "MDN",                  "Cl_Date"',
          '"1",  "A",  "2017-04-15 15:10:42.510"',
          '"2",  "A",  "2017-04-01 14:47:23.210"',
          '"3",  "A",  "2017-04-01 14:49:54.063"',
          '"4",  "B",  "2017-04-30 13:25:00.000"',
          '"5",  "B",  "2017-04-03 17:53:13.217"',
          '"6",  "B",  "2017-04-15 15:17:43.780"')

ts <- read.csv(text = txt, as.is = TRUE)
ts$Cl_Date <- as.POSIXct(ts$Cl_Date)
ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff = c(0,diff(Cl_Date)))
ts <-ts[order(ts$MDN, ts$Cl_Date),]

As a result I have

MDN Cl_Date         time_diff
A   4/1/2017 14:47  0
A   4/1/2017 14:49  2.514216665
A   4/15/2017 15:10 20180.80745
B   4/3/2017 17:53  0
B   4/15/2017 15:17 11.89202041
B   4/30/2017 13:25 14.92171551

So I group by MDN column and compute difference between Cl_Date column. As you can see sometime different in minutes (group A) and sometime difference in days (group B).

Why is time difference in different units and how to correct it?

P.S. I could not reproduce the same example with manual data.frame creation, so I had to read from file.

UPDATE 1 diff(ts$Cl_Date) seems to be consistent, everything is in minutes. Does something break within dplyr?

UPDATE 2

ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff_2 = Cl_Date-lag(Cl_Date))

produces the same result.

2条回答
狗以群分
2楼-- · 2020-04-09 04:37

According to @hadley here, the solution is to use lubridate instead of relying on base R.

This would be something like:

ts %>% 
  group_by(MDN) %>% 
  arrange(Cl_Date) %>%
  mutate(as.duration(Cl_Date %--% lag(Cl_Date)))
查看更多
走好不送
3楼-- · 2020-04-09 04:43
ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff_2 = as.numeric(Cl_Date-lag(Cl_Date), units = 'mins'))

Convert the time difference to a numeric value. You can use units argument to make the return values consistent.

查看更多
登录 后发表回答