ggplot boxplots with scatterplot overlay (same var

2019-06-05 15:05发布

问题:

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:

data    <- read.csv(".....MPS.csv", header=TRUE)
df      <- data.frame(f1=factor(data$Tagging.location), #$
                      f2=factor(data$Station),data$Detection.depth)
df2     <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1   <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
                  geom_boxplot() + stat_summary(fun.data = give.n, geom = "text", 
                  position = position_dodge(height = 0, width = 0.75), size = 3)
                  plot1+xlab("MPS Station") + ylab("Depth(m)") +
                  theme(legend.title=element_blank()) + scale_y_reverse() + 
                  coord_cartesian(ylim=c(150, -10))
plot2   <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
                  plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) + 
                  xlab("MPS Station") + ylab("Depth (m)")

Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.

My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.

I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!

回答1:

Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.

Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).

df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)

df2 <- data.frame(f2=factor(data$Station), depth=data$depth)

Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:

ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()

In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.

You should be able to adjust the plot from here to suit your needs.

I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.