可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have data frame (df) with 3 columns e.g.
NUMERIC1: NUMERIC2: GROUP(CHARACTER):
100 1 A
200 2 B
300 3 C
400 4 A
I want to group NUMERIC1 by GROUP(CHARACTER), and then calculate mean for each group.
Something like that:
mean(NUMERIC1): GROUP(CHARACTER):
250 A
200 B
300 C
Finally I'd like to draw bar chart using ggplot2 having GROUP(CHARACTER) on x axis a =nd mean(NUMERIC) on y axis.
It should look like:
I used
mean <- tapply(df$NUMERIC1, df$GROUP(CHARACTER), FUN=mean)
but I'm not sure if it's ok, and even if it's, I don't know what I supposed to do next.
回答1:
Try something like:
res <- aggregate(NUMERIC1 ~ GROUP, data = df, FUN = mean)
ggplot(res, aes(x = GROUP, y = NUMERIC1)) + geom_bar(stat = "identity")
data
df <- structure(list(NUMERIC1 = c(100L, 200L, 300L, 400L), NUMERIC2 = 1:4,
GROUP = structure(c(1L, 2L, 3L, 1L), .Label = c("A", "B",
"C"), class = "factor")), .Names = c("NUMERIC1", "NUMERIC2",
"GROUP"), class = "data.frame", row.names = c(NA, -4L))
回答2:
This is what stat_summmary(...)
is designed for:
colnames(df) <- c("N1","N2","GROUP")
library(ggplot2)
ggplot(df) + stat_summary(aes(x=GROUP,y=N1),fun.y=mean,geom="bar",
fill="lightblue",col="grey50")
回答3:
I'd suggest something like:
#Imports; data.table, which allows for really convenient "apply a function to
#"each part of a df, by unique value", and ggplot2
library(data.table)
library(ggplot2)
#Convert df to a data.table. It remains a data.frame, so any function that works
#on a data.frame can still work here.
data <- as.data.table(df)
#By each unique value in "CHARACTER", subset and calculate the mean of the
#NUMERIC1 values within that subset. You end up with a data.frame/data.table
#with the columns CHARACTER and mean_value
data <- data[, j = list(mean_value = mean(NUMERIC1)), by = "CHARACTER"]
#And now we play the plotting game (the plotting game is boring, lets
#play Hungry Hungry Hippos!)
plot <- ggplot(data, aes(CHARACTER, mean_value)) + geom_bar()
#And that should do it.
回答4:
Here's a solution using dplyr
to create the summary. In this case, the summary is created on the fly within ggplot
, but you can also create a separate summary data frame first and then feed that to ggplot
.
library(dplyr)
library(ggplot2)
ggplot(df %>% group_by(GROUP) %>%
summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
aes(GROUP, `Mean NUMERIC1`)) +
geom_bar(stat="identity", fill=hcl(195,100,65))
Since you're plotting means, rather than counts, it might make more sense use points, rather than bars. For example:
ggplot(df %>% group_by(GROUP) %>%
summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
aes(GROUP, `Mean NUMERIC1`)) +
geom_point(pch=21, size=5, fill="blue") +
coord_cartesian(ylim=c(0,310))
回答5:
Why ggplot when you could do the same with your own code and barplot:
barplot(tapply(df$NUMERIC1, df$GROUP, FUN=mean))