Reshaping longitudinal value to wide, for data wit

2019-09-29 05:59发布

问题:

R: Reshape data from long to wide, including a date

(I was trying to include data, but the formatting was so horrible that I gave up).

I have a question on reshaping longitudinal data. The trick is that I have multiple rows per subject ('row_num' ranges from 1 to 8).

I've looked at over 20 Stack Overflow pages, some other pages, and Hadley's posts (and articles) on reshaping, and there seems to be a gap I can't find:

I have got multiple rows per subject, and can't find out how to turn that into one row. The trick is that I need to take (say) Q1,...,Q5, and turn it into two or more blocks of questions (Q1.1,...,Q5.1,Q1.2,...Q5.2, ...)

Using one terminology, 'row_num' is a within-subject variable, and everything seems to be based on between-subject variables.

It puzzles and frustrates me, since IMHO the classic longitudinal to wide situation = multiple rows per subject (with one block of variables) to one row per subject (with multiple blocks of variables).

Could somebody direct me to some answers or tutorials which would cover this?

Thank you very much!

Upon edit, here is what a dput of the original data frame would look like:

# A tibble: 6 x 9
  Subject_ID row_num Date          Q1 Q2_text       Q3    Q4    Q5 Q6_text      
       <dbl>   <dbl> <date>     <dbl> <chr>      <dbl> <dbl> <dbl> <chr>        
1          1       1 2019-01-01     4 Because        5     5     1 and so on    
2          1       2 2019-01-02     1 O, bother      5     4     1 NA           
3          1       3 2019-01-03     2 NA             3     4    NA NA           
4          2       1 2018-12-04    NA NA             1     4     1 NA           
5          3       1 2018-12-15     3 In addtion     5    NA     3 NA           
6          3       2 2018-12-26     1 NA             4     3     2 in conclusion

回答1:

We could help if you provided some sample data. The web is full of tutorials for using the following functions that do what you are interested in:

gather() and spread() from the tidyr package

melt() and dcast() from the reshape2 package



回答2:

Here's an approach using tidyr.

library(tidyr)
df2 <- df %>%
  # (optional) First convert all the data columns to text so 
  #   they're readable throughout the process.
  mutate_at(vars(Date:Q6_text), as.character) %>%

  # Gather into long format, where we record the column it came from 
  #   as "question" and the value it held as "value"
  gather(question, value, -Subject_ID, -row_num) %>%

  # Combine the row_num and question into a new column
  unite("question2", c("row_num", "question")) %>%

  # Use that new column to spread everything out
  spread(question2, value)


> df2
  Subject_ID     1_Date 1_Q1  1_Q2_text 1_Q3 1_Q4 1_Q5 1_Q6_text     2_Date 2_Q1 2_Q2_text 2_Q3 2_Q4 2_Q5     2_Q6_text     3_Date 3_Q1 3_Q2_text 3_Q3 3_Q4 3_Q5 3_Q6_text
1          1 2019-01-01    4    Because    5    5    1 and so on 2019-01-02    1 O, bother    5    4    1          <NA> 2019-01-03    2      <NA>    3    4 <NA>      <NA>
2          2 2018-12-04 <NA>       <NA>    1    4    1      <NA>       <NA> <NA>      <NA> <NA> <NA> <NA>          <NA>       <NA> <NA>      <NA> <NA> <NA> <NA>      <NA>
3          3 2018-12-15    3 In addtion    5 <NA>    3      <NA> 2018-12-26    1      <NA>    4    3    2 in conclusion       <NA> <NA>      <NA> <NA> <NA> <NA>      <NA>