Generate self reference key within the table using

2019-08-04 01:27发布

问题:

I have an input table with 3 columns (Person_Id, Visit_Id (unique Id for each visit and each person) and Purpose) as shown below. I would like to generate another new column which provides the immediate preceding visit of the person (ex: if person has visited hospital with Visit Id = 2, then I would like to have another column called "Preceding_visit_Id" which will be 1 (ex:2, if visit id = 5, preceding visit id will be 4). Is there a way to do this in a elegant manner using mutate function?

Input Table

Output Table

As you can see that 'Preceding_visit_id' column refers the previous visit of the person which is defined using visit_id column

Please note that this is a transformation for one of the columns in a huge program, so anything elegant would be helpful.

Dput command output is here

structure(list(Person_Id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
3, 3, 3), Visit_Id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14), Purpose = c("checkup", "checkup", "checkup", "checkup", 
"checkup", "checkup", "checkup", "checkup", "checkup", "checkup", 
"checkup", "checkup", "checkup", "checkup"), Preceding_visit_id = c(NA, 
1, 2, 3, 4, NA, 6, 7, 8, 9, 10, NA, 12, 12)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec = 
structure(list(
 cols = list(Person_Id = structure(list(), class = c("collector_double", 
"collector")), Visit_Id = structure(list(), class = c("collector_double", 
"collector")), Purpose = structure(list(), class = 
 c("collector_character", 
"collector")), Preceding_visit_id = structure(list(), class = 
 c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))'''

回答1:

The Person_Id fields in your examples don't match.

I'm not sure if this is what you're after, but from your dput() I have created a file that removes the last column:

df_input <- df_output %>% 
  select(-Preceding_visit_id)

Then done this:

df_input %>% 
  group_by(Person_Id) %>% 
  mutate(Preceding_visit_id = lag(Visit_Id))

And the output is this:

# A tibble: 14 x 4
# Groups:   Person_Id [3]
   Person_Id Visit_Id Purpose Preceding_visit_id
       <dbl>    <dbl> <chr>                <dbl>
 1         1        1 checkup                 NA
 2         1        2 checkup                  1
 3         1        3 checkup                  2
 4         1        4 checkup                  3
 5         1        5 checkup                  4
 6         2        6 checkup                 NA
 7         2        7 checkup                  6
 8         2        8 checkup                  7
 9         2        9 checkup                  8
10         2       10 checkup                  9
11         2       11 checkup                 10
12         3       12 checkup                 NA
13         3       13 checkup                 12
14         3       14 checkup                 13