R - Pivot like data

2019-07-28 21:25发布

问题:

Using R and tidyverse library I am trying to achieve pivot like result. Here it sample data set:

zz <- "   Date          ParAB ParCD
1         2017-05-27    A     C
2         2017-05-27    B     D
3         2017-05-27    A     D     
4         2017-05-27    B     C     
5         2017-05-27    B     C     
6         2017-05-28    A     D     
7         2017-05-28    A     C     
8         2017-05-28    A     C
9         2017-05-28    A     D"

Data <- read.table(text=zz, header = TRUE)}

I would like transform the data to look like this with number of occurrences per day:

Date           A        B        C       D
2017-05-27     2        3        3       2
2017-05-28     2        0        1       1

I tired spread function that works great on ParAB column.

Data %>%
  group_by(Date, ParAB, ParCD) %>%
  summarise(occr = n()) %>%
  spread(ParAB, occr, fill = 0) %>%
  mutate(occrCD = A+B)

so the result is:

# A tibble: 4 x 5
# Groups:   Date [2]
    Date        ParCD     A     B   occrCD
  <fctr>        <fctr> <dbl> <dbl>  <dbl>
1 2017-05-27      C     1     2      3
2 2017-05-27      D     1     1      2
3 2017-05-28      C     2     0      2
4 2017-05-28      D     2     0      2

and then when I try spread second time it does not work as intended. The data for column A (and B) is not added for particular day for C and D rows. As result I get wrong data.

Code with both steps:

Data %>%
  group_by(Date, ParAB, ParCD) %>%
  summarise(occr = n()) %>%
  spread(ParAB, occr, fill = 0) %>% # first spread - result as expected
  mutate(occrCD = A+B) %>%
  spread(ParCD, occrCD, fill = 0) %>% # second spread, lost sum for A and B
  group_by(Date) %>%
  summarise_all(sum)

and the result that is not what I wanted. The error is visible as A+B should be equal for C +D but for 2017-05-28 it is not. :(

# A tibble: 2 x 5
        Date     A     B     C     D
      <fctr> <dbl> <dbl> <dbl> <dbl>
1 2017-05-27     2     3     3     2
2 2017-05-28     2     0     2     2

I am sure is pretty trivial, but as I am quite new your help highly appreciated.

M

回答1:

No reason to spread twice if you put all your parameters in one column.

library(dplyr)
library(tidyr)

zz <- "   Date          ParAB ParCD
1         2017-05-27    A     C
2         2017-05-27    B     D
3         2017-05-27    A     D     
4         2017-05-27    B     C     
5         2017-05-27    B     C     
6         2017-05-28    A     D     
7         2017-05-28    A     C     
8         2017-05-28    A     C
9         2017-05-28    A     D"

Data <- read.table(text=zz, header = TRUE, stringsAsFactors = F)


Data %>%
  gather(v1,value,-Date) %>%
  count(Date, value) %>%
  spread(value, n, fill = 0)

# # A tibble: 2 x 5
#         Date     A     B     C     D
# *      <chr> <dbl> <dbl> <dbl> <dbl>
# 1 2017-05-27     2     3     3     2
# 2 2017-05-28     4     0     2     2


标签: r tidyverse