I can't wrap my mind around the ave
function. I read the help and searched the net but I still cannot understand what it does. I understand it applies some function on a subset of observation but not in the same way as for example tapply
Could someone please enlighten me perhaps with a small example?
Thanks, and excuse me for perhaps an unusual request.
tapply
returns a single result for each factor level. ave
also produces a single result per factor level, but it copies this value to each position in the original data.
ave
is handy for producing a new column in a data frame with summary data.
A short example:
tapply(iris$Sepal.Length, iris$Species, FUN=mean)
setosa versicolor virginica
5.006 5.936 6.588
One value, the mean for each factor level.
ave
on iris
produces 150 results, which line up with the original data frame:
ave(iris$Sepal.Length, iris$Species, FUN=mean)
[1] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
[17] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
[33] 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006 5.006
[49] 5.006 5.006 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
[65] 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
[81] 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936 5.936
[97] 5.936 5.936 5.936 5.936 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[113] 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[129] 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588 6.588
[145] 6.588 6.588 6.588 6.588 6.588 6.588
As noted in the comments, here the single value is being recycled to fill each location in the original data.
If the function returns multiple values, these are recycled if necessary to fill in the locations. For example:
d <- data.frame(a=rep(1:2, each=5), b=1:10)
ave(d$b, d$a, FUN=rev)
[1] 5 4 3 2 1 10 9 8 7 6
Thanks to Josh and thelatemail.