I'm writting my first
program in R and as a newbie I'm having some troubles, hope you can help me.
I've got a data frame like this:
> v1<-c(1,1,2,3,3,3,4)
> v2<-c(13,5,15,1,2,7,4)
> v3<-c(0,3,6,13,8,23,5)
> v4<-c(26,25,11,2,8,1,0)
> datos<-data.frame(v1,v2,v3,v4)
> names(datos)<-c("Position","a1","a2","a3")
> datos
posicion a1 a2 a3
1 1 13 0 26
2 1 5 3 25
3 2 15 6 11
4 3 1 13 2
5 3 2 8 8
6 3 7 23 1
7 4 4 5 0
What I need is to sum the data in a1
, a2
and a3
(in my real case from a1
to a51
) grouped by Position
. I'm trying with the function aggregate()
but it only works for means, not for sums and I don't know why.
Thanks in advance
This is fairly straightforward with the plyr
library.
library("plyr")
ddply(datos, .(Position), colwise(sum))
If you have additional non-numeric columns that shouldn't be averaged, you can use
ddply(datos, .(Position), numcolwise(sum))
You need to tell the aggregate function to use sum, as the default is for it to get the mean of each category. For example:
aggregate(datos[,c("a1","a2","a3")], by=list(datos$Position), "sum")
ag_df <-- aggregate(.~Position,data=datos,sum)
should give you a data frame containing the sums of the "a" values for each of the positions. The trick here is the . in the formula represents a list of all the "non-grouping" variables in the formula.
Note that you can get much the same result with:
sumdf <- rowsum(datos,datos$Position,na.rm=T)
Except that includes the sums of the positions as well!
If you DON'T want all non-group columns aggregated, you can use cbind as in:
sumdf1 <- aggregate(cbind(a1,a3)~datos$Position,datos,sum)
That sums only the a1 and a3 columns.