Here is my example. I am reading the following file: sample_data
library(dplyr)
txt <- c('"", "MDN", "Cl_Date"',
'"1", "A", "2017-04-15 15:10:42.510"',
'"2", "A", "2017-04-01 14:47:23.210"',
'"3", "A", "2017-04-01 14:49:54.063"',
'"4", "B", "2017-04-30 13:25:00.000"',
'"5", "B", "2017-04-03 17:53:13.217"',
'"6", "B", "2017-04-15 15:17:43.780"')
ts <- read.csv(text = txt, as.is = TRUE)
ts$Cl_Date <- as.POSIXct(ts$Cl_Date)
ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
mutate(time_diff = c(0,diff(Cl_Date)))
ts <-ts[order(ts$MDN, ts$Cl_Date),]
As a result I have
MDN Cl_Date time_diff
A 4/1/2017 14:47 0
A 4/1/2017 14:49 2.514216665
A 4/15/2017 15:10 20180.80745
B 4/3/2017 17:53 0
B 4/15/2017 15:17 11.89202041
B 4/30/2017 13:25 14.92171551
So I group by MDN column and compute difference between Cl_Date column. As you can see sometime different in minutes (group A) and sometime difference in days (group B).
Why is time difference in different units and how to correct it?
P.S. I could not reproduce the same example with manual data.frame
creation, so I had to read from file.
UPDATE 1
diff(ts$Cl_Date)
seems to be consistent, everything is in minutes. Does something break within dplyr
?
UPDATE 2
ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
mutate(time_diff_2 = Cl_Date-lag(Cl_Date))
produces the same result.
According to @hadley here, the solution is to use lubridate instead of relying on base R.
This would be something like:
Convert the time difference to a numeric value. You can use
units
argument to make the return values consistent.