How to add rows with 0 counts to summarised output

2019-01-20 05:31发布

问题:

I have added sample data below, I have used dplyr to count on Rco and month:

structure(list(Rco = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 4L, 4L, 4L), .Label = c("A220", "B334", "C123", "D445"
), class = "factor"), month = structure(c(3L, 2L, 4L, 1L, 3L, 
2L, 4L, 1L, 3L, 4L, 2L, 4L, 3L), .Label = c("Apr", "Feb", "Jan", 
"Mar"), class = "factor"), count = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13)), .Names = c("Rco", "month", "count"), row.names = c(NA, 
-13L), class = "data.frame")

Is there a way to transform this data to:

structure(list(Rco = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("A220", "B334", 
"C123", "D445"), class = "factor"), month = structure(c(3L, 2L, 
4L, 1L, 3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L), .Label = c("Apr", 
"Feb", "Jan", "Mar"), class = "factor"), count = c(1, 2, 3, 4, 
5, 6, 7, 8, 9, 0, 10, 0, 13, 11, 12, 0)), .Names = c("Rco", "month", 
"count"), row.names = c(NA, -16L), class = "data.frame")

So basically I need to add extra rows for all months that have missing count, because dplyr::count does not give 0 counts if a month - Rco combination does not exist.

The number of months is variable in my data ( I have shown Jan Feb Mar Apr but it could be for all 12 months as well) , so please if someone can provide me a dynamic solution, I would be grateful.

回答1:

You can use tidyr::complete and specify the fill to be 0 (instead of the default NA):

library(tidyr)
complete(df, Rco, month, fill = list(count = 0))


回答2:

We can use expand.grid on the unique values of the first two columns, and merge with the initial dataset. This will fill NA for combinations that are not present in the expand.grid.

res <- merge(expand.grid(lapply(df1[1:2], unique)), df1, all.x=TRUE)

But, it is easy to change the NA to 0

res[is.na(res)] <- 0


标签: r count dplyr rows