Aggregate R sum

2019-01-20 01:45发布

问题:

I'm writting my first program in R and as a newbie I'm having some troubles, hope you can help me.

I've got a data frame like this:

> v1<-c(1,1,2,3,3,3,4)
> v2<-c(13,5,15,1,2,7,4)
> v3<-c(0,3,6,13,8,23,5)
> v4<-c(26,25,11,2,8,1,0)
> datos<-data.frame(v1,v2,v3,v4)
> names(datos)<-c("Position","a1","a2","a3")

> datos
  posicion a1 a2 a3
1        1 13  0 26
2        1  5  3 25
3        2 15  6 11
4        3  1 13  2
5        3  2  8  8
6        3  7 23  1
7        4  4  5  0

What I need is to sum the data in a1, a2 and a3 (in my real case from a1 to a51) grouped by Position. I'm trying with the function aggregate() but it only works for means, not for sums and I don't know why.

Thanks in advance

回答1:

This is fairly straightforward with the plyr library.

library("plyr")
ddply(datos, .(Position), colwise(sum))

If you have additional non-numeric columns that shouldn't be averaged, you can use

ddply(datos, .(Position), numcolwise(sum))


回答2:

You need to tell the aggregate function to use sum, as the default is for it to get the mean of each category. For example:

aggregate(datos[,c("a1","a2","a3")], by=list(datos$Position), "sum")


回答3:

ag_df <-- aggregate(.~Position,data=datos,sum)

should give you a data frame containing the sums of the "a" values for each of the positions. The trick here is the . in the formula represents a list of all the "non-grouping" variables in the formula.

Note that you can get much the same result with:

sumdf <- rowsum(datos,datos$Position,na.rm=T)

Except that includes the sums of the positions as well!

If you DON'T want all non-group columns aggregated, you can use cbind as in:

sumdf1 <- aggregate(cbind(a1,a3)~datos$Position,datos,sum)

That sums only the a1 and a3 columns.



标签: r sum aggregate