How would you plot a box plot and specific points

2020-07-18 03:48发布

问题:

We can draw box plot as below:

qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot")

and point as:

qplot(factor(cyl), mpg, data = mtcars, geom = "point") 

How would you combine both - but just to show a few specific points(say when wt is less than 2) on top of the box?

回答1:

Use + geom_point(...) on your qplot (just add a + geom_point() to get all the points plotted).

To plot selectively just select those points that you want to plot:

n <- nrow(mtcars)
# plot every second point
idx <- seq(1,n,by=2)

qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
     geom_point( aes(x=factor(cyl)[idx],y=mpg[idx]) )    # <-- see [idx] ?

If you know the points before-hand, you can feed them in directly e.g.:

qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
     geom_point( aes(x=factor(c(4,6,8)),y=c(15,20,25)) ) # plot (4,15),(6,20),...


回答2:

If you are trying to plot two geoms with two different datasets (boxplot for mtcars, points for a data.frame of literal values), this is a way to do it that makes your intent clear. This works with the current (Sep 2016) version of ggplot (ggplot2_2.1.0)

library(ggplot2)
ggplot() +
  # box plot of mtcars (mpg vs cyl)
  geom_boxplot(data = mtcars, 
               aes(x = factor(cyl), y= mpg)) +
  # points of data.frame literal
  geom_point(data = data.frame(x = factor(c(4,6,8)), y = c(15,20,25)),
             aes(x=x, y=y),
             color = 'red')

I threw in a color = 'red' for the set of points, so it's easy to distinguish them from the points generated as part of geom_boxplot



回答3:

You can show both by using ggplot() rather than qplot(). The syntax may be a little harder to understand, but you can usually get much more done. If you want to plot both the box plot and the points you can write:

boxpt <- ggplot(data = mtcars, aes(factor(cyl), mpg)) 
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(aes(factor(cyl), mpg))

I don't know what you mean by only plotting specific points on top of the box, but if you want a cheap (and probably not very smart) way of just showing points above the edge of the box, here it is:

boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(data = ddply(mtcars, .(cyl),summarise, mpg = mpg[mpg > quantile(mpg, 0.75)]), aes(factor(cyl), mpg))

Basically it's the same thing except for the data supplied to geom_point is adjusted to include only the mpg numbers in the top quarter of the distribution by cylinder. In general I'm not sure this is good practice because I think people expect to see points beyond the whiskers only, but there you go.