Connecting points to regression line in plot

2019-02-14 07:49发布

问题:

Consider the following data from the ISLR book:

Income <- structure(list(X = 1:30, Education = c(10, 10.4013377926421, 
10.8428093645485, 11.2441471571906, 11.6454849498328, 12.0869565217391, 
12.4882943143813, 12.8896321070234, 13.2909698996656, 13.7324414715719, 
14.133779264214, 14.5351170568562, 14.9765886287625, 15.3779264214047, 
15.7792642140468, 16.2207357859532, 16.6220735785953, 17.0234113712375, 
17.4648829431438, 17.866220735786, 18.2675585284281, 18.7090301003344, 
19.1103678929766, 19.5117056856187, 19.9130434782609, 20.3545150501672, 
20.7558528428094, 21.1571906354515, 21.5986622073579, 22), Income = c(26.6588387834389, 
27.3064353457772, 22.1324101716143, 21.1698405046065, 15.1926335164307, 
26.3989510407284, 17.435306578572, 25.5078852305278, 36.884594694235, 
39.666108747637, 34.3962805641312, 41.4979935356871, 44.9815748660704, 
47.039595257834, 48.2525782901863, 57.0342513373801, 51.4909192102538, 
61.3366205527288, 57.581988179306, 68.5537140185881, 64.310925303692, 
68.9590086393083, 74.6146392793647, 71.8671953042483, 76.098135379724, 
75.77521802986, 72.4860553152424, 77.3550205741877, 72.1187904524136, 
80.2605705009016)), .Names = c("X", "Education", "Income"), class = "data.frame", row.names = c(NA, 
-30L))

I want to reproduce their plot, which connects the points to the regression line to represent the error term, that is:

It is easy to plot the points and the regression line with ggplot2:

ggplot(Income, aes(Education, Income)) + geom_point(color="red")+geom_smooth(se=FALSE)

But I could not think of an easy way to connect the points to the regression line like they did.

Is there an easy way do it (with either base, ggplot2 or lattice)?

回答1:

Prelim code

require("ggplot2")

mod <- loess(Income ~ Education, data = Income)
Income <- transform(Income, Fitted = fitted(mod))

ggplot version

ggplot(Income, aes(Education, Income)) + 
  geom_point(color="red") + 
  geom_smooth(se=FALSE, method = "loess") +
  geom_segment(aes(x = Education, y = Income,
                   xend = Education, yend = Fitted))

base graphics version

plot(Income ~ Education, data = Income, type = "p", col = "red",
     cex = 1.25)
points(Fitted ~ Education, data = Income)
lines(Fitted ~ Education, data = Income, col = "blue")
with(Income, segments(Education, Income, Education, Fitted))

Adjust accordingly to suit your aesthetic needs.