I would like to ggplot(R) a bar graph of aggregated values based on the computation of multiple numeric columns of a table vs. some categorical column (this is also the "group by") of said table.
df:
V1 V2 categorical
1 1 c1
2 1 c2
1 3 c2
2 3 c3
I am interested in my effective aggregate function to be:
sum(V1 * V2) / sum(V2)
I attempted this:
ggplot(df, aes(x = categorical)) +
stat_summary_bin(aes(y = V1 * V2),
fun.args = list(d = df$V2),
fun.y = function(y, d) sum(y) / sum(d),
geom = "bar")
but the values resulted lower than expected. My desired result is c1: 1, c2: 1.25, c3: 2 but the actual result is:
The best way to create the desired plot is to compute the desired statistics manually before calling
ggplot
. Here is the code usingtidyverse
tools:Notes:
With
stat = "identity"
geom_bar
doesn't perform any computation and just plots the precomputed values. It was designed specifically for the kind of situations like yours.At
c2
output should be 1.25, I presume.This is a bit tricky because ggplot wants to sum values for each row, whereas you want to sum two different calculations individually and then just display a single value for all rows. I'm not sure how to call this explicitly within ggplot. However, you can do it by adding a value column to the data frame first. (I'm assuming that c2 was meant to be 1.25, and your 1.5 was a mistake...)
This will also work: