Using ggplot2, I can create a histogram with a cumulative distribution curve with the following code. However, the stat_ecdf
curve is scaled to the left y-axis.
library(ggplot2)
test.data <- data.frame(values = replicate(1, sample(0:10,1000, rep=TRUE)))
g <- ggplot(test.data, aes(x=values))
g + geom_bar() +
stat_ecdf() +
scale_y_continuous(sec.axis=sec_axis(trans = ~./100, name="percentage"))
Here is the graph generated (you can see the ecdf at the bottom):
How do I scale the stat_ecdf
to the second y-axis?
In general, you want to multiply the internally calculated ECDF value (the cumulative density), which is called
..y..
, by the inverse of the axis transformation, so that its vertical extent will be similar to that of the bars:Because you distributed 1,000 values randomly among 11 buckets, it happened to turn out that both y-scales were multiples of 10. Below is a more general version.
In addition, it would be nice to be able to programmatically determine the transformation factor, so that we don't have to pick it by hand after seeing the bar heights in the plot. To do that, we calculate the height of the highest bar outside ggplot and use that value (called
max_y
below) in the plot. We also use thepretty
function to resetmax_y
to the highest break value on the y-axis associated with the highest bar (ggplot usespretty
to set the default axis breaks), so that the primary and secondary y-axis breaks will line up.Finally, we use
aes_
andbquote
to create a quoted call, so that ggplot will recognize the passedmax_y
value.