Collapsing a data frame over one variable

2019-03-06 10:42发布

问题:

I have a data frame in the following format:

Site    Year    Month   Count1  Count2  Count3  Patch
1        1        May     15      12      10      1
1        1        May     8        0      5       2
1        1        May     3         1      2      3
1        1        May     4        4      1       4
1        1        June    6       5       1       1
1        1        June    9        1      3       2
1        1        June    3       0       0       3
1        1        June    5       5       2       4
1        1        July    4       0       3       1
..........

And I wish to collapse the data frame across the levels of patch such that the three count variables are summed. i.e.

Site    Year    Month   Count1  Count2  Count3  
1        1        May     30      17      18     
1        1        June    23      11       6       
1        1        July     4       0       3      
.........

I've looked at the aggregate and tapply commands, but they do not seem to sum across patch as required.

Can somebody please advise on a command that will convert the data accordingly.

Thank you.

回答1:

With aggregate:

> ( a <- aggregate(.~Site+Year+Month, dat[-length(dat)], sum) )
#   Site Year Month Count1 Count2 Count3
# 1    1    1  July      4      0      3
# 2    1    1  June     23     11      6
# 3    1    1   May     30     17     18

Where dat is your data.

Note that your July results in the post are seem to be incorrect.

For the result in the order of the original data, you can use

> a[order(as.character(unique(dat$Month))), ]
#   Site Year Month Count1 Count2 Count3
# 3    1    1   May     30     17     18
# 2    1    1  June     23     11      6
# 1    1    1  July      4      0      3


回答2:

Or data.table solution (which will keep your data sorted by the original month order)

library(data.table)
setDT(df)[, lapply(.SD, sum), 
            by = list(Site, Year, Month), 
            .SDcols = paste0("Count", seq_len(3))]

#    Site Year Month Count1 Count2 Count3
# 1:    1    1   May     30     17     18
# 2:    1    1  June     23     11      6
# 3:    1    1  July      4      0      3


回答3:

library(dplyr) 
dat %>% 
group_by(Site, Year, Month) %>% 
summarise_each(funs(sum=sum(., na.rm=TRUE)), Count1:Count3)
# Source: local data frame [3 x 6]
#Groups: Site, Year

#    Site Year Month Count1 Count2 Count3
#  1    1    1  July      4      0      3  
#  2    1    1  June     23     11      6
#  3    1    1   May     30     17     18