Good afternoon
I am not an R user, but I need to get the difference between the first date and last date within RFID, to create a new column X. Therefore, the first value needs to be 1 (not zero), the second 2, ..., n.
Here an example of the data.
Thanks in advance.
RFID visit_date ADFI location
985152014315936 2017-11-25 2133 16
985152014315936 2017-11-26 2186 16
985152014315936 2017-11-27 3489 16
985152014315936 2017-11-28 2432 16
985152014315937 2017-11-24 15 17
985152014315937 2017-11-25 1512 17
985152014315937 2017-11-26 2378 17
985152014315937 2017-11-27 3241 17
985152014315938 2017-11-24 584 17
985152014315938 2017-11-25 1689 17
985152014315938 2017-11-26 2807 17
985152014315938 2017-11-27 2369 17
985152014315938 2017-11-28 2576 17
985152014315939 2017-11-25 1084 17
985152014315939 2017-11-26 3489 17
985152014315939 2017-11-27 2630 17
985152014315939 2017-11-28 3585 17
985152014315939 2017-11-29 3433 17
985152014315939 2017-11-30 2962 17
Here is a solution using dplyr
and lubridate
:
require(tidyverse);
require(lubridate);
df %>% group_by(RFID) %>% mutate(X = max(ymd(visit_date)) - min(ymd(visit_date)));
## A tibble: 19 x 5
## Groups: RFID [4]
# RFID visit_date ADFI location X
# <dbl> <fct> <int> <int> <time>
# 1 985152014315936 2017-11-25 2133 16 3
# 2 985152014315936 2017-11-26 2186 16 3
# 3 985152014315936 2017-11-27 3489 16 3
# 4 985152014315936 2017-11-28 2432 16 3
# 5 985152014315937 2017-11-24 15 17 3
# 6 985152014315937 2017-11-25 1512 17 3
# 7 985152014315937 2017-11-26 2378 17 3
# 8 985152014315937 2017-11-27 3241 17 3
# 9 985152014315938 2017-11-24 584 17 4
#10 985152014315938 2017-11-25 1689 17 4
#11 985152014315938 2017-11-26 2807 17 4
#12 985152014315938 2017-11-27 2369 17 4
#13 985152014315938 2017-11-28 2576 17 4
#14 985152014315939 2017-11-25 1084 17 5
#15 985152014315939 2017-11-26 3489 17 5
#16 985152014315939 2017-11-27 2630 17 5
#17 985152014315939 2017-11-28 3585 17 5
#18 985152014315939 2017-11-29 3433 17 5
#19 985152014315939 2017-11-30 2962 17 5
Sample data
df <- read.table(text =
"RFID visit_date ADFI location
985152014315936 2017-11-25 2133 16
985152014315936 2017-11-26 2186 16
985152014315936 2017-11-27 3489 16
985152014315936 2017-11-28 2432 16
985152014315937 2017-11-24 15 17
985152014315937 2017-11-25 1512 17
985152014315937 2017-11-26 2378 17
985152014315937 2017-11-27 3241 17
985152014315938 2017-11-24 584 17
985152014315938 2017-11-25 1689 17
985152014315938 2017-11-26 2807 17
985152014315938 2017-11-27 2369 17
985152014315938 2017-11-28 2576 17
985152014315939 2017-11-25 1084 17
985152014315939 2017-11-26 3489 17
985152014315939 2017-11-27 2630 17
985152014315939 2017-11-28 3585 17
985152014315939 2017-11-29 3433 17
985152014315939 2017-11-30 2962 17", header = T)
Using data.table:
data <- data.table(data)
data[, diff := max(as.Date(visit_date)) - min(as.Date(visit_date)), by = RFID]
of if you want to add 1:
data[, diff := max(as.Date(visit_date)) - min(as.Date(visit_date)) + 1, by = RFID]