I would like to ggplot(R) a bar graph of aggregated values based on the computation of multiple numeric columns of a table vs. some categorical column (this is also the "group by") of said table.
df:
V1 V2 categorical
1 1 c1
2 1 c2
1 3 c2
2 3 c3
I am interested in my effective aggregate function to be:
sum(V1 * V2) / sum(V2)
I attempted this:
ggplot(df, aes(x = categorical)) +
stat_summary_bin(aes(y = V1 * V2),
fun.args = list(d = df$V2),
fun.y = function(y, d) sum(y) / sum(d),
geom = "bar")
but the values resulted lower than expected. My desired result is c1: 1, c2: 1.25, c3: 2 but the actual result is:
The best way to create the desired plot is to compute the desired statistics manually before calling ggplot
. Here is the code using tidyverse
tools:
library(tidyverse)
df %>%
group_by(categorical) %>%
summarise(stat = sum(V1 * V2) / sum(V2)) %>%
ggplot(aes(categorical, stat)) +
geom_bar(stat = "identity")
Notes:
With stat = "identity"
geom_bar
doesn't perform any computation and just plots the precomputed values. It was designed specifically for the kind of situations like yours.
At c2
output should be 1.25, I presume.
This is a bit tricky because ggplot wants to sum values for each row, whereas you want to sum two different calculations individually and then just display a single value for all rows. I'm not sure how to call this explicitly within ggplot. However, you can do it by adding a value column to the data frame first. (I'm assuming that c2 was meant to be 1.25, and your 1.5 was a mistake...)
df=data.frame(V1=c(1,2,1,2), V2=c(1,1,3,3),categorical=c("c1","c2","c2","c3"))
find.val<-function(df){
df$value<-(sum(df$V1*df$V2))/((sum(df$V2))*length(df$categorical))
return(df)
}
library(nlme)
df<-do.call(rbind.data.frame, gapply(df, groups=df$categorical, FUN=find.val))
ggplot(df, aes(x = categorical,y=value)) + geom_bar(stat="identity")
This will also work:
df <- data.frame(categorical=sort(unique(df$categorical)),
V1_V2=aggregate(V1*V2~categorical, df, sum)[,2]/aggregate(V2~categorical, df, sum)[,2])
ggplot(df) +
geom_bar(aes(categorical, V1_V2), stat = "identity")