Piecewise regression with R: plotting the segments

2019-03-08 15:39发布

I have 54 points. They represent offer and demand for products. I would like to show there is a break point in the offer.

First, I sort the x-axis (offer) and remove the values that appears twice. I have 47 values, but I remove the first and last ones (doesn't make sense to consider them as break points). Break is of length 45:

Break<-(sort(unique(offer))[2:46])

Then, for each of these potential break points, I estimate a model and I keep in "d" the residual standard error (sixth element in model summary object).

d<-numeric(45)
for (i in 1:45) {
model<-lm(demand~(offer<Break[i])*offer + (offer>=Break[i])*offer)
d[i]<-summary(model)[[6]] }

Plotting d, I notice that my smaller residual standard error is 34, that corresponds to "Break[34]": 22.4. So I write my model with my final break point:

model<-lm(demand~(offer<22.4)*offer + (offer>=22.4)*offer)

Finally, I'm happy with my new model. It's significantly better than the simple linear one. And I want to draw it:

plot(demand~offer)
i <- order(offer)
lines(offer[i], predict(model,list(offer))[i])

But I have a warning message:

Warning message:
In predict.lm(model, list(offer)) :
  prediction from a rank-deficient fit may be misleading

And more important, the lines are really strange on my plot.

My plot with the supposedly two segments, but not joining

Here are my data:

demand <- c(1155, 362, 357, 111, 703, 494, 410, 63, 616, 468, 973, 235,
            180, 69, 305, 106, 155, 422, 44, 1008, 225, 321, 1001, 531, 143,
            251, 216, 57, 146, 226, 169, 32, 75, 102, 4, 68, 102, 462, 295,
            196, 50, 739, 287, 226, 706, 127, 85, 234, 153, 4, 373, 54, 81,
            18)
offer <- c(39.3, 23.5, 22.4, 6.1, 35.9, 35.5, 23.2, 9.1, 27.5, 28.6, 41.3,
           16.9, 18.2, 9, 28.6, 12.7, 11.8, 27.9, 21.6, 45.9, 11.4, 16.6,
           40.7, 22.4, 17.4, 14.3, 14.6, 6.6, 10.6, 14.3, 3.4, 5.1, 4.1,
           4.1, 1.7, 7.5, 7.8, 22.6, 8.6, 7.7, 7.8, 34.7, 15.6, 18.5, 35,
           16.5, 11.3, 7.7, 14.8, 2, 12.4, 9.2, 11.8, 3.9)

标签: r plot piecewise
3条回答
不美不萌又怎样
2楼-- · 2019-03-08 16:15

The weird lines are simply due to the order in which the points are plotted. The following should look better:

i <- order(offer)
lines(offer[i], predict(model,list(offer))[i])

The warning comes from the fact that the * character is interpreted by lm.

> lm(demand~(offer<22.4)*offer + (offer>=22.4)*offer)
Call:
lm(formula = demand ~ (offer < 22.4) * offer + (offer >= 22.4) * offer)
Coefficients:
            (Intercept)         offer < 22.4TRUE                    offer  
                -309.46                   356.08                    29.86  
      offer >= 22.4TRUE   offer < 22.4TRUE:offer  offer:offer >= 22.4TRUE  
                     NA                   -20.79                       NA  

In addition, (offer<22.4)*offer is a discontinuous function: this is where the discontinuity comes from.

The following should be closer to what you want.

model <- lm(
  demand ~ ifelse(offer<22.4,offer-22.4,0) 
           + ifelse(offer>=22.4,offer-22.4,0) )
查看更多
兄弟一词,经得起流年.
3楼-- · 2019-03-08 16:23

Vincent has you on the right track. The only thing "weird" about the lines in your resulting plot is that lines draws a line between each successive point, which means that "jump" you see if it simply connecting the two ends of each line.

If you don't want that connector, you have to split the lines call into two separate pieces.

Also, I feel like you can simplify your regression a bit. Here's what I did:

#After reading your data into dat
Break <- 22.4
dat$grp <- dat$offer < Break

#Note the addition of the grp variable makes this a bit easier to read
m <- lm(demand~offer*grp,data = dat)
dat$pred <- predict(m)

plot(dat$offer,dat$demand)
dat <- dat[order(dat$offer),]
with(subset(dat,offer < Break),lines(offer,pred))
with(subset(dat,offer >= Break),lines(offer,pred))

which produces this plot:

enter image description here

查看更多
\"骚年 ilove
4楼-- · 2019-03-08 16:26

Here is an easier approach using ggplot2.

require(ggplot2)
qplot(offer, demand, group = offer > 22.4, geom = c('point', 'smooth'), 
   method = 'lm', se = F, data = dat)

EDIT. I would also recommend taking a look at this package segmented which supports automatic detection and estimation of segmented regression models.

enter image description here

UPDATE:

Here is an example that makes use of the R package segmented to automatically detect the breaks

library(segmented)
set.seed(12)
xx <- 1:100
zz <- runif(100)
yy <- 2 + 1.5*pmax(xx - 35, 0) - 1.5*pmax(xx - 70, 0) + 15*pmax(zz - .5, 0) + 
  rnorm(100,0,2)
dati <- data.frame(x = xx, y = yy, z = zz)
out.lm <- lm(y ~ x, data = dati)
o <- segmented(out.lm, seg.Z = ~x, psi = list(x = c(30,60)),
  control = seg.control(display = FALSE)
)
dat2 = data.frame(x = xx, y = broken.line(o)$fit)

library(ggplot2)
ggplot(dati, aes(x = x, y = y)) +
  geom_point() +
  geom_line(data = dat2, color = 'blue')

segmented

查看更多
登录 后发表回答