Extract slope of multiple trend lines from geom_sm

2019-02-23 07:34发布

问题:

I am trying to plot multiple trend lines (every ten years) in a time series using ggplot.

Here's the data:

dat <- structure(list(YY = 1961:2010, a = c(98L, 76L, 83L, 89L, 120L, 
107L, 83L, 83L, 92L, 104L, 98L, 91L, 81L, 69L, 86L, 76L, 85L, 
86L, 70L, 81L, 77L, 89L, 60L, 80L, 94L, 66L, 77L, 85L, 77L, 80L, 
79L, 79L, 65L, 70L, 80L, 87L, 84L, 67L, 106L, 129L, 95L, 79L, 
67L, 105L, 118L, 85L, 86L, 103L, 97L, 106L)), .Names = c("YY", 
"a"), row.names = c(NA, -50L), class = "data.frame")

Here's the script:

p <- ggplot(dat, aes(x = YY))
p <- p + geom_line(aes(y=a),colour="blue",lwd=1)
p <- p + geom_point(aes(y=a),colour="blue",size=2)

p <- p + theme(panel.background=element_rect(fill="white"),
         plot.margin = unit(c(0.5,0.5,0.5,0.5),"cm"),
         panel.border=element_rect(colour="black",fill=NA,size=1),
         axis.line.x=element_line(colour="black"),
         axis.line.y=element_line(colour="black"),
         axis.text=element_text(size=15,colour="black",family="serif"),
         axis.title=element_text(size=15,colour="black",family="serif"),
         legend.position = "top")

p <- p + scale_x_discrete(limits = c(seq(1961,2010,5)),expand=c(0,0))

p <- p + geom_smooth(data=dat[1:10,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[11:20,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[21:30,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[31:40,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[41:50,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + labs(x="Year",y="Number of Days")
outImg <- paste0("test",".png")
ggsave(outImg,p,width=8,height=5)

This is the resulting image:

WHAT I WANT/PROBLEMS

  1. I want to extract the slope and add them on the the trend lines in the figure. How can I extract the slope of each line from the geom_smooth()?

  2. Currently, I am plotting the trend lines one by one. I want to know if there is an efficient way of doing this with adjustable time window. Suppose for example, I want to plot the trend lines for every 5 years. In the figure above the time window is 10.

  3. Suppose, I only want to plot the significant trend lines (i.e., p-value < 0.05, null: no trend or slope equals 0), is it possible to implement this with geom_smooth()?

I'll appreciate any help.

回答1:

So, each of these tasks are best handled before you pipe your data into ggplot2, but they are all made fairly easy using some of the other packages from the tidyverse.

Beginning with questions 1 and 2:

While ggplot2 can plot the regression line, to extract the estimated slope coefficients you need to work with the lm() object explicitly. Using group_by() and mutate(), you can add a grouping variable (my code below does this for 5 year groups just for example) and then calculate and extract just the slope estimate into columns in your existing data frame. Then those slope estimates can be plotted in ggplot using the geom_text() call. I've done this below in a quick and dirty way (placing each label at the mean of the x and y values they regress) but you can specify their exact placement in your dataframe.

Grouping variables and data prep makes question 2 a breeze too: now that you have the grouping variables explicitly in your dataframe there is no need to plot one by one, geom_smooth() accepts the group aesthetic.

Additionally, to answer question 3, you can extract the pvalue from the summary of your lm objects and filter out only those that are significant to the level you care about. If you pass this now complete dataframe to geom_smooth() and geom_text() you will get the plot you're looking for!

library(tidyverse)

 # set up our base plot
 p <- ggplot(dat, aes(x = YY, y = a)) +
  geom_line(colour = "blue", lwd = 1) +
  geom_point(colour = "blue", size = 2) +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
    panel.border = element_rect(colour = "black", fill = NA, size = 1),
    axis.line.x = element_line(colour = "black"),
    axis.line.y = element_line(colour = "black"),
    axis.text = element_text(size = 15, colour = "black", family = "serif"),
    axis.title = element_text(size = 15, colour = "black", family = "serif"),
    legend.position = "top"
  ) +
  scale_x_discrete(limits = c(seq(1961, 2010, 5)), expand = c(0, 0))

# add a grouping variable (or many!)
 prep5 <- dat %>%
  mutate(group5 = rep(1:10, each = 5)) %>%
  group_by(group5) %>%
  mutate(
    slope = round(lm(YY ~ a)$coefficients[2], 2),
    significance = summary(lm(YY ~ a))$coefficients[2, 4],
    x = mean(YY),   # x coordinate for slope label
    y = mean(a)     # y coordinate for slope label
  ) %>%
  filter(significance < .2)   # only keep those with a pvalue < .2 

p + geom_smooth(
  data = prep5, aes(x = YY, y = a, group = group5),  # grouping variable does the plots for us!
  method = "lm", se = FALSE, color = "black",
  formula = y ~ x, linetype = "dashed"
) +
  geom_text(
    data = prep5, aes(x = x, y = y, label = slope),
    nudge_y = 12, nudge_x = -1
  )

Now you may want to be a little more careful about specifying the location of your text labels than I have been here. I used means and the nudge_* arguments of geom_text() to do a quick example but keep in mind since these values are mapped explicitly to x and y coordinates, you have complete control!

Created on 2018-07-16 by the reprex package (v0.2.0).