Creating a ggplot2 histogram with a cumulative dis

Using ggplot2, I can create a histogram with a cumulative distribution curve with the following code. However, the stat_ecdf curve is scaled to the left y-axis.

library(ggplot2)
test.data <- data.frame(values = replicate(1, sample(0:10,1000, rep=TRUE)))
g <- ggplot(test.data, aes(x=values))
g + geom_bar() + 
    stat_ecdf() + 
    scale_y_continuous(sec.axis=sec_axis(trans = ~./100, name="percentage"))

Here is the graph generated (you can see the ecdf at the bottom):

How do I scale the stat_ecdf to the second y-axis?

标签： r ggplot2

1条回答

Juvenile、少年°

2楼-- · 2019-07-27 15:58

In general, you want to multiply the internally calculated ECDF value (the cumulative density), which is called ..y.., by the inverse of the axis transformation, so that its vertical extent will be similar to that of the bars:

library(tidyverse)
library(scales)

set.seed(2)
test.data <- data.frame(values = replicate(1, sample(0:10,1000, rep=TRUE)))

ggplot(test.data, aes(x=values)) +
  geom_bar(fill="grey70") + 
  stat_ecdf(aes(y=..y..*100)) + 
  scale_y_continuous(sec.axis=sec_axis(trans = ~./100 , name="percentage", labels=percent)) +
  theme_bw()

Because you distributed 1,000 values randomly among 11 buckets, it happened to turn out that both y-scales were multiples of 10. Below is a more general version.

In addition, it would be nice to be able to programmatically determine the transformation factor, so that we don't have to pick it by hand after seeing the bar heights in the plot. To do that, we calculate the height of the highest bar outside ggplot and use that value (called max_y below) in the plot. We also use the pretty function to reset max_y to the highest break value on the y-axis associated with the highest bar (ggplot uses pretty to set the default axis breaks), so that the primary and secondary y-axis breaks will line up.

Finally, we use aes_ and bquote to create a quoted call, so that ggplot will recognize the passed max_y value.

set.seed(2)
test.data <- data.frame(values = replicate(1, sample(0:10,768, rep=TRUE)))

max_y = max(table(test.data$values))
max_y = max(pretty(c(0,max_y)))

ggplot(test.data, aes(x=values)) +
  geom_bar(fill="grey70") + 
  stat_ecdf(aes_(y=bquote(..y.. * .(max_y)))) + 
  scale_y_continuous(sec.axis=sec_axis(trans = ~./max_y, name="percentage", labels=percent)) +
  theme_bw()

0人赞添加讨论(0) 举报

Creating a ggplot2 histogram with a cumulative dis

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间