I have a tall data frame as such:
data = data.frame("id"=c(1,2,3,4,5,6,7,8,9,10),
"group"=c(1,1,2,1,2,2,2,2,1,2),
"type"=c(1,1,2,3,2,2,3,3,3,1),
"score1"=c(sample(1:4,10,r=T)),
"score2"=c(sample(1:4,10,r=T)),
"score3"=c(sample(1:4,10,r=T)),
"score4"=c(sample(1:4,10,r=T)),
"score5"=c(sample(1:4,10,r=T)),
"weight1"=c(173,109,136,189,186,146,173,102,178,174),
"weight2"=c(147,187,125,126,120,165,142,129,144,197),
"weight3"=c(103,192,102,159,128,179,195,193,135,145),
"weight4"=c(114,182,199,101,111,116,198,123,119,181),
"weight5"=c(159,125,104,171,166,154,197,124,180,154))
library(reshape2)
library(plyr)
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
data1 <- data1[order(data1$id), ]
And what I want to create is a new data frame like so:
want = data.frame("score"=rep(1:4,6),
"group"=rep(1:2,12),
"type"=rep(1:3,8),
"weightedCOUNT"=NA) # how to calculate this? count(data1, score, wt = weight)
I am just not sure how to calculate weightedCOUNT which should apply the weights to the score variable so then it gives in column 'weightedCOUNT' a weighted count that is aggregated by score and group and type.
An option would be to
melt
(fromdata.table
- which can take multiplemeasure
patterns
, and then grouped by 'group', 'type' get thecount
If we need to have a
complete
set of combinationsIf I understand correctly,
weightedCOUNT
is the sum of weights grouped byscore
,group
, andtype
.For the sake of completeness, I would like to show how the accepted solution would look like when implemented in pure base R and pure
data.table
syntax, resp.Base R
The OP was almost there. He has already reshaped
data
from wide to long format for multiple value variables. Only the final aggregation step was missing:result
can be reordered bydata.table
As shown by akrun,
melt()
from thedata.table
package can be combined withdplyr
. Alternatively, we can stay with thedata.table
syntax for aggregation:The
keyby
parameter is used for grouping and ordering the output in one step.Completion of missing combinations of the grouping variables is also possible in
data.table
syntax using the cross join functionCJ()
: