I'm trying to wrap my head around why a histogram using qplot is changing when I add a facet_wrap. You can find the data I used for this example here
The factors in my facet_grid come from using cut
:
library(ggplot2)
data <- read.csv("data.csv")
data$cuts <- cut(data$y, c(0, 50, 100, 500, Inf))
So now I have this:
> summary(data)
y x cuts
Min. : 10.00 Min. :0.000 (0,50] :530
1st Qu.: 20.75 1st Qu.:1.000 (50,100] :179
Median : 46.00 Median :1.000 (100,500]:258
Mean : 110.18 Mean :0.834 (500,Inf]: 33
3rd Qu.: 121.00 3rd Qu.:1.000
Max. :1526.00 Max. :1.000
If I look at only the section where cuts=="(0,50]"
, it looks fine.:
qplot(x, data=subset(data, cuts=="(0,50]"))
But when I add a facet grid, the y-axes are all wrong:
qplot(x, data=data) + facet_grid(cuts~., scales="free_y")
Notice that the y-axis on the top facet is now only 40ish instead of over 450. The only facet that seems to be right is (500,Inf]
.
edit: I'm using ggplot 0.9.0 in R 2.14.2
It seems to be the ggplot facet_grid "ignoring duplicate row" bug as Sandy indicates. A simple work around is to add this do-nothing line,
to make every row unique and you get what you want.
ggplot 9.1 was release a day or so ago and this was taken care of in that package release. So my answer is download the ggplot2 0.9.1 bug fix version.
Also you may want to play with the
right
argument ingeom_histogram
as in:geom_histogram(binwidth = 0.2, right = TRUE) +