How to group data and then draw bar chart in ggplo

2019-09-10 07:15发布

I have data frame (df) with 3 columns e.g.

NUMERIC1:      NUMERIC2:      GROUP(CHARACTER):
100            1               A
200            2               B
300            3               C
400            4               A

I want to group NUMERIC1 by GROUP(CHARACTER), and then calculate mean for each group. Something like that:

mean(NUMERIC1):  GROUP(CHARACTER):
250                  A
200                  B
300                  C

Finally I'd like to draw bar chart using ggplot2 having GROUP(CHARACTER) on x axis a =nd mean(NUMERIC) on y axis. It should look like:

chart

I used

mean <- tapply(df$NUMERIC1, df$GROUP(CHARACTER), FUN=mean)

but I'm not sure if it's ok, and even if it's, I don't know what I supposed to do next.

5条回答
该账号已被封号
2楼-- · 2019-09-10 07:35

This is what stat_summmary(...) is designed for:

colnames(df) <- c("N1","N2","GROUP")
library(ggplot2)
ggplot(df) + stat_summary(aes(x=GROUP,y=N1),fun.y=mean,geom="bar", 
                          fill="lightblue",col="grey50")

enter image description here

查看更多
虎瘦雄心在
3楼-- · 2019-09-10 07:42

Try something like:

res <- aggregate(NUMERIC1 ~ GROUP, data = df, FUN = mean)
ggplot(res, aes(x = GROUP, y = NUMERIC1)) + geom_bar(stat = "identity")

pic

data

df <- structure(list(NUMERIC1 = c(100L, 200L, 300L, 400L), NUMERIC2 = 1:4, 
    GROUP = structure(c(1L, 2L, 3L, 1L), .Label = c("A", "B", 
    "C"), class = "factor")), .Names = c("NUMERIC1", "NUMERIC2", 
"GROUP"), class = "data.frame", row.names = c(NA, -4L))
查看更多
我命由我不由天
4楼-- · 2019-09-10 07:45

Here's a solution using dplyr to create the summary. In this case, the summary is created on the fly within ggplot, but you can also create a separate summary data frame first and then feed that to ggplot.

library(dplyr)
library(ggplot2)

ggplot(df %>% group_by(GROUP) %>% 
         summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
       aes(GROUP, `Mean NUMERIC1`)) + 
  geom_bar(stat="identity", fill=hcl(195,100,65))

enter image description here

Since you're plotting means, rather than counts, it might make more sense use points, rather than bars. For example:

ggplot(df %>% group_by(GROUP) %>% 
         summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
       aes(GROUP, `Mean NUMERIC1`)) + 
  geom_point(pch=21, size=5, fill="blue") + 
  coord_cartesian(ylim=c(0,310))
查看更多
成全新的幸福
5楼-- · 2019-09-10 07:46

I'd suggest something like:

#Imports; data.table, which allows for really convenient "apply a function to
#"each part of a df, by unique value", and ggplot2
library(data.table)
library(ggplot2)

#Convert df to a data.table. It remains a data.frame, so any function that works
#on a data.frame can still work here.
data <- as.data.table(df)

#By each unique value in "CHARACTER", subset and calculate the mean of the
#NUMERIC1 values within that subset. You end up with a data.frame/data.table
#with the columns CHARACTER and mean_value
data <- data[, j = list(mean_value = mean(NUMERIC1)), by = "CHARACTER"]

#And now we play the plotting game (the plotting game is boring, lets
#play Hungry Hungry Hippos!)
plot <- ggplot(data, aes(CHARACTER, mean_value)) + geom_bar()

#And that should do it.
查看更多
唯我独甜
6楼-- · 2019-09-10 07:58

Why ggplot when you could do the same with your own code and barplot:

barplot(tapply(df$NUMERIC1, df$GROUP, FUN=mean))

enter image description here

查看更多
登录 后发表回答