I have a data frame in the following format and I want to get table based on aggregate value:
VALUE Time1 Time2
1 NN NF
2 FF FF
7 NF FF
4 NN NN
3 NN FF
3 NF NF
5 NF NF
6 FF FF
I can create a simple table by using the table() function
table(Time1,Time2)
Which gives me the following output
FF FN NF NN Total
FF 2 0 0 0 2
FN 0 0 0 0 0
NF 1 0 2 0 3
Total 3 0 2 0 5
I want the above data frame to be cross tabulated based on the sum of the VALUE column. I can do that in excel using the sumif function and get the following output.
FF FN NF NN Total
FF 8 0 0 0 8
FN 0 0 0 0 0
NF 7 0 8 0 15
NN 3 0 1 4 8
Total 18 0 9 4 31
I need help to do this in R ?
For cases of sum
you can just use xtabs
. Here, I've wrapped it in addmargins
to get the totals too:
addmargins(xtabs(VALUE ~ Time1 + Time2, mydf))
# Time2
# Time1 FF NF NN Sum
# FF 8 0 0 8
# NF 7 8 0 15
# NN 3 1 4 8
# Sum 18 9 4 31
More generally, you might want to look at dcast
from "reshape2":
library(reshape2)
dcast(mydf, Time1 ~ Time2, value.var="VALUE", fun.aggregate=sum, margins=TRUE)
# Time1 FF NF NN (all)
# 1 FF 8 0 0 8
# 2 NF 7 8 0 15
# 3 NN 3 1 4 8
# 4 (all) 18 9 4 31
To address @SimonO101's question, if the data are correctly factored, then all levels will show by default with the xtabs
approach. However, you will need to specify drop = FALSE
with the dcast
version.
Taking the above data (which, as it is does not contain a "Time1" or "Time2" of "FN"), let's factor
both those columns and see how that changes the output:
mydf[-1] <- lapply(mydf[-1], function(x) factor(x, c("FF", "FN", "NF", "NN")))
addmargins(xtabs(VALUE ~ Time1 + Time2, mydf))
# Time2
# Time1 FF FN NF NN Sum
# FF 8 0 0 0 8
# FN 0 0 0 0 0
# NF 7 0 8 0 15
# NN 3 0 1 4 8
# Sum 18 0 9 4 31
As mentioned, the dcast
equivalent would be:
dcast(mydf, Time1 ~ Time2, value.var="VALUE",
fun.aggregate=sum, margins=TRUE, drop=FALSE)