Generate table based on aggregate values in R

2019-09-19 06:11发布

I have a data frame in the following format and I want to get table based on aggregate value:

VALUE   Time1   Time2
   1    NN  NF
   2    FF  FF
   7    NF  FF
   4    NN  NN
   3    NN  FF
   3    NF  NF
   5    NF  NF
   6    FF  FF

I can create a simple table by using the table() function

 table(Time1,Time2)

Which gives me the following output

     FF FN  NF  NN  Total
 FF  2  0   0   0    2
 FN  0  0   0   0    0
 NF  1  0   2   0    3

Total 3 0   2   0    5

I want the above data frame to be cross tabulated based on the sum of the VALUE column. I can do that in excel using the sumif function and get the following output.

    FF  FN  NF  NN  Total
 FF 8   0   0   0   8
 FN 0   0   0   0   0
 NF 7   0   8   0   15
 NN 3   0   1   4   8
 Total  18  0   9   4   31

I need help to do this in R ?

1条回答
虎瘦雄心在
2楼-- · 2019-09-19 06:41

For cases of sum you can just use xtabs. Here, I've wrapped it in addmargins to get the totals too:

addmargins(xtabs(VALUE ~ Time1 + Time2, mydf))
#      Time2
# Time1 FF NF NN Sum
#   FF   8  0  0   8
#   NF   7  8  0  15
#   NN   3  1  4   8
#   Sum 18  9  4  31

More generally, you might want to look at dcast from "reshape2":

library(reshape2)
dcast(mydf, Time1 ~ Time2, value.var="VALUE", fun.aggregate=sum, margins=TRUE)
#   Time1 FF NF NN (all)
# 1    FF  8  0  0     8
# 2    NF  7  8  0    15
# 3    NN  3  1  4     8
# 4 (all) 18  9  4    31

To address @SimonO101's question, if the data are correctly factored, then all levels will show by default with the xtabs approach. However, you will need to specify drop = FALSE with the dcast version.

Taking the above data (which, as it is does not contain a "Time1" or "Time2" of "FN"), let's factor both those columns and see how that changes the output:

mydf[-1] <- lapply(mydf[-1], function(x) factor(x, c("FF", "FN", "NF", "NN")))
addmargins(xtabs(VALUE ~ Time1 + Time2, mydf))
#      Time2
# Time1 FF FN NF NN Sum
#   FF   8  0  0  0   8
#   FN   0  0  0  0   0
#   NF   7  0  8  0  15
#   NN   3  0  1  4   8
#   Sum 18  0  9  4  31

As mentioned, the dcast equivalent would be:

dcast(mydf, Time1 ~ Time2, value.var="VALUE", 
      fun.aggregate=sum, margins=TRUE, drop=FALSE)
查看更多
登录 后发表回答