difference between the first date and last date wi

2019-08-16 03:54发布

问题:

Good afternoon I am not an R user, but I need to get the difference between the first date and last date within RFID, to create a new column X. Therefore, the first value needs to be 1 (not zero), the second 2, ..., n.

Here an example of the data.

Thanks in advance.

RFID              visit_date   ADFI   location
985152014315936   2017-11-25   2133   16
985152014315936   2017-11-26   2186   16
985152014315936   2017-11-27   3489   16
985152014315936   2017-11-28   2432   16
985152014315937   2017-11-24     15   17
985152014315937   2017-11-25   1512   17
985152014315937   2017-11-26   2378   17
985152014315937   2017-11-27   3241   17
985152014315938   2017-11-24    584   17
985152014315938   2017-11-25   1689   17
985152014315938   2017-11-26   2807   17
985152014315938   2017-11-27   2369   17
985152014315938   2017-11-28   2576   17
985152014315939   2017-11-25   1084   17
985152014315939   2017-11-26   3489   17
985152014315939   2017-11-27   2630   17
985152014315939   2017-11-28   3585   17
985152014315939   2017-11-29   3433   17
985152014315939   2017-11-30   2962   17

回答1:

Here is a solution using dplyr and lubridate:

require(tidyverse);
require(lubridate);

df %>% group_by(RFID) %>% mutate(X = max(ymd(visit_date)) - min(ymd(visit_date)));
## A tibble: 19 x 5
## Groups:   RFID [4]
#              RFID visit_date  ADFI location X
#             <dbl> <fct>      <int>    <int> <time>
# 1 985152014315936 2017-11-25  2133       16 3
# 2 985152014315936 2017-11-26  2186       16 3
# 3 985152014315936 2017-11-27  3489       16 3
# 4 985152014315936 2017-11-28  2432       16 3
# 5 985152014315937 2017-11-24    15       17 3
# 6 985152014315937 2017-11-25  1512       17 3
# 7 985152014315937 2017-11-26  2378       17 3
# 8 985152014315937 2017-11-27  3241       17 3
# 9 985152014315938 2017-11-24   584       17 4
#10 985152014315938 2017-11-25  1689       17 4
#11 985152014315938 2017-11-26  2807       17 4
#12 985152014315938 2017-11-27  2369       17 4
#13 985152014315938 2017-11-28  2576       17 4
#14 985152014315939 2017-11-25  1084       17 5
#15 985152014315939 2017-11-26  3489       17 5
#16 985152014315939 2017-11-27  2630       17 5
#17 985152014315939 2017-11-28  3585       17 5
#18 985152014315939 2017-11-29  3433       17 5
#19 985152014315939 2017-11-30  2962       17 5

Sample data

df <- read.table(text =
    "RFID              visit_date   ADFI   location
985152014315936   2017-11-25   2133   16
985152014315936   2017-11-26   2186   16
985152014315936   2017-11-27   3489   16
985152014315936   2017-11-28   2432   16
985152014315937   2017-11-24     15   17
985152014315937   2017-11-25   1512   17
985152014315937   2017-11-26   2378   17
985152014315937   2017-11-27   3241   17
985152014315938   2017-11-24    584   17
985152014315938   2017-11-25   1689   17
985152014315938   2017-11-26   2807   17
985152014315938   2017-11-27   2369   17
985152014315938   2017-11-28   2576   17
985152014315939   2017-11-25   1084   17
985152014315939   2017-11-26   3489   17
985152014315939   2017-11-27   2630   17
985152014315939   2017-11-28   3585   17
985152014315939   2017-11-29   3433   17
985152014315939   2017-11-30   2962   17", header = T)


回答2:

Using data.table:

data <- data.table(data)
data[, diff := max(as.Date(visit_date)) - min(as.Date(visit_date)), by = RFID]

of if you want to add 1:

data[, diff := max(as.Date(visit_date)) - min(as.Date(visit_date)) + 1, by = RFID]