Predicting via Lowess in R (OR reconciling Loess &

I'm trying to interpolate/locally extrapolate some salary data to fill out a data set.

Here's the data set and a plot of the available data:

    experience   salary
 1:          1 21878.67
 2:          2 23401.33
 3:          3 23705.00
 4:          4 24260.00
 5:          5 25758.60
 6:          6 26763.40
 7:          7 27920.00
 8:          8 28600.00
 9:          9 28820.00
10:         10 32600.00
11:         12 30650.00
12:         14 32600.00
13:         15 32600.00
14:         16 37700.00
15:         17 33380.00
16:         20 36784.33
17:         23 35600.00
18:         25 33590.00
19:         30 32600.00
20:         31 33920.00
21:         35 32600.00

Raw Data

Given the clear nonlinearity, I'm hoping to interpolate & extrapolate (I want to fill in experience for years 0 through 40) via a local linear estimator, so I defaulted to lowess, which gives this:

Lowess

This is nice on the plot, but the raw data is missing -- R's plotting device has filled in the blanks for us. I haven't been able to find a predict method for this function, as it seems R is moving towards using loess, which as I understand is a generalization.

However, when I use loess (setting surface="direct" to be able to extrapolate, as mentioned in ?loess), which has a standard predict method, the fit is less satisfactory:

Loess

(There are strong theoretical reasons to say that salary should be non-decreasing--there is some noise/possible mis-measurement driving the U shape here)

And I can't seem to be able to fiddle around with any of the parameters to get back the non-decreasing fit given by lowess.

Any suggestions for what to do?

标签： r interpolation predict loess extrapolation

1条回答

啃猪蹄的小仙女

2楼-- · 2019-06-02 12:04

I don't know how to "reconcile" those two functions but I have used the cobs package (COnstrained B-Splines Nonparametric Regression Quantiles ) with some success for similar tasks. The default quantile is the (local) median or 0.5 quantile. In this dataset the default choices for span or kernel width seem very appropriate.

require(cobs)
Loading required package: cobs
Package cobs (1.3-0) attached.  To cite, see citation("cobs")

 Rbs <- cobs(x=dat$experience,y=dat$salary, constraint= "increase")
qbsks2():
# Performing general knot selection ...
#
# Deleting unnecessary knots ...
 Rbs
#COBS regression spline (degree = 2) from call:
#    cobs(x = dat$experience, y = dat$salary, constraint = "increase")
#{tau=0.5}-quantile;  dimensionality of fit: 5 from {5}
#x$knots[1:4]:  0.999966,  5.000000, 15.000000, 35.000034
plot(Rbs, lwd = 2.5)

enter image description here

It does have a predict method although you will need to use idiosyncratic arguments since it doesn't support the usual data= formalism:

 help(predict.cobs)
 predict(Rbs, z=seq(0,40,by=5))
       z      fit
 [1,]  0 21519.83
 [2,]  5 25488.71
 [3,] 10 30653.44
 [4,] 15 32773.21
 [5,] 20 33295.84
 [6,] 25 33669.14
 [7,] 30 33893.12
 [8,] 35 33967.78
 [9,] 40 33893.12

0人赞添加讨论(0) 举报

Predicting via Lowess in R (OR reconciling Loess &

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间