I would like to apply some functions such as mean and variance to column x
of my DataFrame
for each unique value in column y
. I can imagine building a loop that manually subsets the DataFrame
to accomplish my end but I am trying not to reinvent the wheel for something which is likely a common feature.
using DataFrames
mydf = DataFrame(y = [randstring(1) for i in 1:1000], x = rand(1000))
# I could imagine a function that looks like:
apply(function = mean, across = mydf[:x], by = mydf[:y])
You're right this is very common. Take a look at the split-apply-combine chapter in the documentation. There are several approaches here: you can either use the more general
by
function to specify exactly what columns you want to operate over, or you can use the handyaggregate
function to use all the other columns and automatically name them sensibly: