Using R and tidyverse library I am trying to achieve pivot like result. Here it sample data set:
zz <- " Date ParAB ParCD
1 2017-05-27 A C
2 2017-05-27 B D
3 2017-05-27 A D
4 2017-05-27 B C
5 2017-05-27 B C
6 2017-05-28 A D
7 2017-05-28 A C
8 2017-05-28 A C
9 2017-05-28 A D"
Data <- read.table(text=zz, header = TRUE)}
I would like transform the data to look like this with number of occurrences per day:
Date A B C D
2017-05-27 2 3 3 2
2017-05-28 2 0 1 1
I tired spread function that works great on ParAB column.
Data %>%
group_by(Date, ParAB, ParCD) %>%
summarise(occr = n()) %>%
spread(ParAB, occr, fill = 0) %>%
mutate(occrCD = A+B)
so the result is:
# A tibble: 4 x 5
# Groups: Date [2]
Date ParCD A B occrCD
<fctr> <fctr> <dbl> <dbl> <dbl>
1 2017-05-27 C 1 2 3
2 2017-05-27 D 1 1 2
3 2017-05-28 C 2 0 2
4 2017-05-28 D 2 0 2
and then when I try spread second time it does not work as intended. The data for column A (and B) is not added for particular day for C and D rows. As result I get wrong data.
Code with both steps:
Data %>%
group_by(Date, ParAB, ParCD) %>%
summarise(occr = n()) %>%
spread(ParAB, occr, fill = 0) %>% # first spread - result as expected
mutate(occrCD = A+B) %>%
spread(ParCD, occrCD, fill = 0) %>% # second spread, lost sum for A and B
group_by(Date) %>%
summarise_all(sum)
and the result that is not what I wanted. The error is visible as A+B should be equal for C +D but for 2017-05-28 it is not. :(
# A tibble: 2 x 5
Date A B C D
<fctr> <dbl> <dbl> <dbl> <dbl>
1 2017-05-27 2 3 3 2
2 2017-05-28 2 0 2 2
I am sure is pretty trivial, but as I am quite new your help highly appreciated.
M
No reason to
spread
twice if you put all your parameters in one column.