We can draw box plot as below:
qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot")
and point as:
qplot(factor(cyl), mpg, data = mtcars, geom = "point")
How would you combine both - but just to show a few specific points(say when wt
is less than 2) on top of the box?
Use + geom_point(...)
on your qplot
(just add a + geom_point()
to get all the points plotted).
To plot selectively just select those points that you want to plot:
n <- nrow(mtcars)
# plot every second point
idx <- seq(1,n,by=2)
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(cyl)[idx],y=mpg[idx]) ) # <-- see [idx] ?
If you know the points before-hand, you can feed them in directly e.g.:
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(c(4,6,8)),y=c(15,20,25)) ) # plot (4,15),(6,20),...
If you are trying to plot two geoms with two different datasets (boxplot for mtcars, points for a data.frame of literal values), this is a way to do it that makes your intent clear. This works with the current (Sep 2016) version of ggplot (ggplot2_2.1.0
)
library(ggplot2)
ggplot() +
# box plot of mtcars (mpg vs cyl)
geom_boxplot(data = mtcars,
aes(x = factor(cyl), y= mpg)) +
# points of data.frame literal
geom_point(data = data.frame(x = factor(c(4,6,8)), y = c(15,20,25)),
aes(x=x, y=y),
color = 'red')
I threw in a color = 'red'
for the set of points, so it's easy to distinguish them from the points generated as part of geom_boxplot
You can show both by using ggplot()
rather than qplot()
. The syntax may be a little harder to understand, but you can usually get much more done. If you want to plot both the box plot and the points you can write:
boxpt <- ggplot(data = mtcars, aes(factor(cyl), mpg))
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(aes(factor(cyl), mpg))
I don't know what you mean by only plotting specific points on top of the box, but if you want a cheap (and probably not very smart) way of just showing points above the edge of the box, here it is:
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(data = ddply(mtcars, .(cyl),summarise, mpg = mpg[mpg > quantile(mpg, 0.75)]), aes(factor(cyl), mpg))
Basically it's the same thing except for the data supplied to geom_point
is adjusted to include only the mpg numbers in the top quarter of the distribution by cylinder. In general I'm not sure this is good practice because I think people expect to see points beyond the whiskers only, but there you go.