R/ggplot2 non-trivial aggregation function using m

2019-07-31 18:52发布

I would like to ggplot(R) a bar graph of aggregated values based on the computation of multiple numeric columns of a table vs. some categorical column (this is also the "group by") of said table.

df:

V1  V2  categorical
 1   1     c1
 2   1     c2
 1   3     c2
 2   3     c3

I am interested in my effective aggregate function to be:

sum(V1 * V2) / sum(V2)

I attempted this:

ggplot(df, aes(x = categorical)) +
   stat_summary_bin(aes(y = V1 * V2), 
                    fun.args = list(d = df$V2), 
                    fun.y = function(y, d) sum(y) / sum(d), 
                    geom = "bar")

but the values resulted lower than expected. My desired result is c1: 1, c2: 1.25, c3: 2 but the actual result is:

actual result

标签: r ggplot2
3条回答
太酷不给撩
2楼-- · 2019-07-31 19:18

The best way to create the desired plot is to compute the desired statistics manually before calling ggplot. Here is the code using tidyverse tools:

library(tidyverse)
df %>%
  group_by(categorical) %>%
  summarise(stat = sum(V1 * V2) / sum(V2)) %>%
  ggplot(aes(categorical, stat)) +
    geom_bar(stat = "identity")

Notes:

  1. With stat = "identity" geom_bar doesn't perform any computation and just plots the precomputed values. It was designed specifically for the kind of situations like yours.

  2. At c2 output should be 1.25, I presume.

查看更多
你好瞎i
3楼-- · 2019-07-31 19:22

This is a bit tricky because ggplot wants to sum values for each row, whereas you want to sum two different calculations individually and then just display a single value for all rows. I'm not sure how to call this explicitly within ggplot. However, you can do it by adding a value column to the data frame first. (I'm assuming that c2 was meant to be 1.25, and your 1.5 was a mistake...)

df=data.frame(V1=c(1,2,1,2), V2=c(1,1,3,3),categorical=c("c1","c2","c2","c3"))
find.val<-function(df){
  df$value<-(sum(df$V1*df$V2))/((sum(df$V2))*length(df$categorical))
  return(df)
}
library(nlme)
df<-do.call(rbind.data.frame, gapply(df, groups=df$categorical, FUN=find.val))

ggplot(df, aes(x = categorical,y=value)) + geom_bar(stat="identity")
查看更多
Fickle 薄情
4楼-- · 2019-07-31 19:23

This will also work:

df <- data.frame(categorical=sort(unique(df$categorical)), 
  V1_V2=aggregate(V1*V2~categorical, df, sum)[,2]/aggregate(V2~categorical, df, sum)[,2])

ggplot(df) + 
  geom_bar(aes(categorical, V1_V2), stat = "identity")

enter image description here

查看更多
登录 后发表回答