Im tryng to avoid a time consuming for loop by using an aggregate on a data.frame. But I need that the values of one of the columns enters in the final computation.
dat <- data.frame(key = c('a', 'b', 'a','b'),
rate = c(0.5,0.4,1,0.6),
v1 = c(4,0,3,1),
v2 = c(2,0,9,4))
>dat
key rate v1 v2
1 a 0.5 4 2
2 b 0.4 0 0
3 a 1.0 3 9
4 b 0.6 1 4
aggregate(dat[,-1], list(key=dat$key),
function(x, y=dat$rate){
rates <- as.numeric(y)
values <- as.numeric(x)
return(sum(values*rates)/sum(rates))
})
Note: The function is just an example!
The problem of this implementation is that y=dat$rate
gives all 4 rates on dat, when what I want is just the 2 aggregated rates!
Anny sugestion on how I could do this?
Thanks!
Here's what I managed to achieve, using the "
data.table
" package:OK. So that's easy to write out for just two variables, but what about when we have a lot more columns. Use
lapply(.SD,...)
in conjunction with your function:First, some data:
Second, aggregate:
If you have a really large dataset, you might want to explore
data.table
in general.For what it is worth, I was also successful in base R, but I'm not sure how efficient this would be, particularly because of the transposing and so on.
One solution is to use
ddply
from theplyr
package:If you want to apply this to all the
v
columns, I would recommend first changing your data structure a bit:and then using
ddply
again:...or is you need a standard R solution, you can use
by
: