I'd like to create a ggplot2 histogram in which the plot's limits are equal to the smallest and largest values in the data set, without excluding those values from the actual histogram.
I get the behavior I'm looking for when using base graphics. Specifically, the second histogram below shows all of the same values as the first histogram (i.e., no bins are excluded in the second histogram), even though I've included an xlim
argument to the second plot:
min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)
xlim <- c(min_wt, max_wt)
hist(mtcars$wt, breaks = 30, main = "No limits added")
hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added")
ggplot2 isn't giving me this behavior though:
library(ggplot2)
# Using green colour to make dropped bins easy to see:
p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30)
p + ggtitle("No limits added")
p + xlim(xlim) + ggtitle("Limits added")
See how in the second plot I lose one of the points that is below 2 and 2 of the points that are above 5? I would like to know how to fix this. A few misc notes:
First, specifying boundary
allows me to include the minimum values (i.e., those below 2) in the histogram, but I still don't have a solution to the 2 values greater than 5 that are getting dropped:
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too")
Second, the presence of the issue is dependent on the value chosen for bins
. For example, when I increase bins
to be 50, I don't get any dropped values:
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 50, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too, but with bins = 50")
Finally, I believe this issue is related to the one presented on SO here: geom_histogram: wrong bins? and discussed here as well: https://github.com/tidyverse/ggplot2/issues/1651. In other words, I think this issue is related to a "rounding error." I describe this error in more depth in my second post (the one with the graphs shown in it) on this issue: https://github.com/daattali/ggExtra/issues/81.
Here is my session info:
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.3-2 scales_0.5.0.9000
[4] compiler_3.4.2 lazyeval_0.2.1 plyr_1.8.4
[7] tools_3.4.2 pillar_1.2.1 gtable_0.2.0
[10] tibble_1.4.2 yaml_2.1.16 Rcpp_0.12.15
[13] grid_3.4.2 rlang_0.2.0.9000 munsell_0.4.3