Let ggplot2 histogram show classwise percentages o

2019-04-21 15:53发布

library(ggplot2)
data = diamonds[, c('carat', 'color')]
data = data[data$color %in% c('D', 'E'), ]

I would like to compare the histogram of carat across color D and E, and use the classwise percentage on the y-axis. The solutions I have tried are as follows:

Solution 1:

ggplot(data=data, aes(carat, fill=color)) +  geom_bar(aes(y=..density..), position='dodge', binwidth = 0.5) + ylab("Percentage") +xlab("Carat")

enter image description here

This is not quite right since the y-axis shows the height of the estimated density.

Solution 2:

 ggplot(data=data, aes(carat, fill=color)) +  geom_histogram(aes(y=(..count..)/sum(..count..)), position='dodge', binwidth = 0.5) + ylab("Percentage") +xlab("Carat")

enter image description here

This is also not I want, because the denominator used to calculate the ratio on the y-axis are the total count of D + E.

Is there a way to display the classwise percentages with ggplot2's stacked histogram? That is, instead of showing (# of obs in bin)/count(D+E) on y axis, I would like it to show (# of obs in bin)/count(D) and (# of obs in bin)/count(E) respectively for two color classes. Thanks.

标签: r ggplot2
2条回答
该账号已被封号
2楼-- · 2019-04-21 16:03

You can scale them by group by using the ..group.. special variable to subset the ..count.. vector. It is pretty ugly because of all the dots, but here it goes

ggplot(data, aes(carat, fill=color)) +
  geom_histogram(aes(y=c(..count..[..group..==1]/sum(..count..[..group..==1]),
                         ..count..[..group..==2]/sum(..count..[..group..==2]))*100),
                 position='dodge', binwidth=0.5) +
  ylab("Percentage") + xlab("Carat")

enter image description here

查看更多
萌系小妹纸
3楼-- · 2019-04-21 16:09

It seems that binning the data outside of ggplot2 is the way to go. But I would still be interested to see if there is a way to do it with ggplot2.

library(dplyr)
breaks = seq(0,4,0.5)

data$carat_cut = cut(data$carat, breaks = breaks)

data_cut = data %>%
  group_by(color, carat_cut) %>%
  summarise (n = n()) %>%
  mutate(freq = n / sum(n))

ggplot(data=data_cut, aes(x = carat_cut, y=freq*100, fill=color)) + geom_bar(stat="identity",position="dodge") + scale_x_discrete(labels = breaks) +  ylab("Percentage") +xlab("Carat")

enter image description here

查看更多
登录 后发表回答