Julia: Create summary values for column x for each

2019-05-23 12:02发布

I would like to apply some functions such as mean and variance to column x of my DataFrame for each unique value in column y. I can imagine building a loop that manually subsets the DataFrame to accomplish my end but I am trying not to reinvent the wheel for something which is likely a common feature.

using DataFrames
mydf = DataFrame(y = [randstring(1) for i in 1:1000], x = rand(1000))
# I could imagine a function that looks like:
apply(function = mean, across = mydf[:x], by = mydf[:y])

1条回答
兄弟一词,经得起流年.
2楼-- · 2019-05-23 12:10

You're right this is very common. Take a look at the split-apply-combine chapter in the documentation. There are several approaches here: you can either use the more general by function to specify exactly what columns you want to operate over, or you can use the handy aggregate function to use all the other columns and automatically name them sensibly:

julia> aggregate(mydf, :y, mean)
62×2 DataFrames.DataFrame
│ Row │ y   │ x_mean   │
├─────┼─────┼──────────┤
│ 1   │ "0" │ 0.454196 │
│ 2   │ "1" │ 0.541434 │
│ 3   │ "2" │ 0.36734  │
⋮
查看更多
登录 后发表回答