two y-axes with different scales for two datasets

2020-02-01 06:56发布

问题:

I have two datasets (can be combined into a single one) that share common x values, while the y values are different - I want to plot the y values in one dataset and put the y-axis on the left of the plot, while plotting the y values in the other dataset and put the y-axis on the right of the same plot. Of course, the relative scales for the two y-axis values are different (actually should be "adjusted" according to the y values in the first dataset. The points in the two datasets will be in different colors in order to distinguish the two scales.

An example is shown below:

d1 = data.frame(x=c(100, 200, 300, 400), y=seq(0.1, 0.4, by=0.1)) # 1st dataset
d2 = data.frame(x=c(100, 200, 300, 400), y=seq(0.8, 0.5, by=-0.1)) # 2nd dataset
p1 = ggplot(data = d1, aes(x=x, y=y)) + geom_point()
p2 = ggplot(data = d2, aes(x=x, y=y)) + geom_point() +
  scale_y_continuous(position = "right")
p1
p2

In ggplot2, I cannot do p1+p2 as it will show an error message Error: Don't know how to add o to a plot. Please help. Thank you!

回答1:

Up front, this type of graph is a good example of why it took so long to get a second axis into ggplot2: it can very easily be confusing, leading to mis-interpretations. As such, I'll go to pains here to provide multiple indicators of what goes where.

First, the use of sec_axis requires a transformation on the original axis. This is typically done in the form of an intercept/slope formula such as ~ 2*. + 10, where the period indicates the value to scale. In this case, I think we could get away with simply ~ 2*.

However, this implies that you need to plot all data on the original axis, meaning you need d2$y to be pre-scaled to d1$y's limits. Simple enough, you just need the reverse transformation as what will be used in sec_axis.

I'm going to combine the data into a single data.frame, though, in order to use ggplot2's grouping.

d1 = data.frame(x=c(100, 200, 300, 400), y=seq(0.1, 0.4, by=0.1)) # 1st dataset
d2 = data.frame(x=c(100, 200, 300, 400), y=seq(0.8, 0.5, by=-0.1)) # 2nd dataset
d1$z <- "data1"
d2$z <- "data2"
d3 <- within(d2, { y = y/2 })
d4 <- rbind(d1, d3)
d4
#     x    y     z
# 1 100 0.10 data1
# 2 200 0.20 data1
# 3 300 0.30 data1
# 4 400 0.40 data1
# 5 100 0.40 data2
# 6 200 0.35 data2
# 7 300 0.30 data2
# 8 400 0.25 data2

In order to control color in all components, I'll set it manually:

mycolors <- c("data1"="blue", "data2"="red")

Finally, the plot:

library(ggplot2)
ggplot(d4, aes(x=x, y=y, group=z, color=z)) +
  geom_path() +
  geom_point() +
  scale_y_continuous(name="data1", sec.axis = sec_axis(~ 2*., name="data2")) +
  scale_color_manual(name="z", values = mycolors) +
  theme(
    axis.title.y = element_text(color = mycolors["data1"]),
    axis.text.y = element_text(color = mycolors["data1"]),
    axis.title.y.right = element_text(color = mycolors["data2"]),
    axis.text.y.right = element_text(color = mycolors["data2"])
  )

Frankly, though, I don't like the different slopes. That is, two blocks on the blue axis are 0.1, whereas on the red axis they are 0.2. If you're talking about two vastly different "things", then this may be fine. If, however, the slopes of the two lines are directly comparable, then you might prefer to keep the size of each block to be the same. For this, we'll use a transformation of just an intercept, no change in slope. That means the in-data.frame transformation could be y = y - 0.4, and the plot complement ~ . + 0.4, producing:

PS: hints taken from https://stackoverflow.com/a/45683665/3358272 and https://stackoverflow.com/a/6920045/3358272