Predicting via Lowess in R (OR reconciling Loess &

2019-06-02 11:52发布

I'm trying to interpolate/locally extrapolate some salary data to fill out a data set.

Here's the data set and a plot of the available data:

    experience   salary
 1:          1 21878.67
 2:          2 23401.33
 3:          3 23705.00
 4:          4 24260.00
 5:          5 25758.60
 6:          6 26763.40
 7:          7 27920.00
 8:          8 28600.00
 9:          9 28820.00
10:         10 32600.00
11:         12 30650.00
12:         14 32600.00
13:         15 32600.00
14:         16 37700.00
15:         17 33380.00
16:         20 36784.33
17:         23 35600.00
18:         25 33590.00
19:         30 32600.00
20:         31 33920.00
21:         35 32600.00

Raw Data

Given the clear nonlinearity, I'm hoping to interpolate & extrapolate (I want to fill in experience for years 0 through 40) via a local linear estimator, so I defaulted to lowess, which gives this:

Lowess

This is nice on the plot, but the raw data is missing -- R's plotting device has filled in the blanks for us. I haven't been able to find a predict method for this function, as it seems R is moving towards using loess, which as I understand is a generalization.

However, when I use loess (setting surface="direct" to be able to extrapolate, as mentioned in ?loess), which has a standard predict method, the fit is less satisfactory:

Loess

(There are strong theoretical reasons to say that salary should be non-decreasing--there is some noise/possible mis-measurement driving the U shape here)

And I can't seem to be able to fiddle around with any of the parameters to get back the non-decreasing fit given by lowess.

Any suggestions for what to do?

1条回答
啃猪蹄的小仙女
2楼-- · 2019-06-02 12:04

I don't know how to "reconcile" those two functions but I have used the cobs package (COnstrained B-Splines Nonparametric Regression Quantiles ) with some success for similar tasks. The default quantile is the (local) median or 0.5 quantile. In this dataset the default choices for span or kernel width seem very appropriate.

require(cobs)
Loading required package: cobs
Package cobs (1.3-0) attached.  To cite, see citation("cobs")

 Rbs <- cobs(x=dat$experience,y=dat$salary, constraint= "increase")
qbsks2():
# Performing general knot selection ...
#
# Deleting unnecessary knots ...
 Rbs
#COBS regression spline (degree = 2) from call:
#    cobs(x = dat$experience, y = dat$salary, constraint = "increase")
#{tau=0.5}-quantile;  dimensionality of fit: 5 from {5}
#x$knots[1:4]:  0.999966,  5.000000, 15.000000, 35.000034
plot(Rbs, lwd = 2.5)

enter image description here

It does have a predict method although you will need to use idiosyncratic arguments since it doesn't support the usual data= formalism:

 help(predict.cobs)
 predict(Rbs, z=seq(0,40,by=5))
       z      fit
 [1,]  0 21519.83
 [2,]  5 25488.71
 [3,] 10 30653.44
 [4,] 15 32773.21
 [5,] 20 33295.84
 [6,] 25 33669.14
 [7,] 30 33893.12
 [8,] 35 33967.78
 [9,] 40 33893.12
查看更多
登录 后发表回答