I'm trying to interpolate/locally extrapolate some salary data to fill out a data set.
Here's the data set and a plot of the available data:
experience salary
1: 1 21878.67
2: 2 23401.33
3: 3 23705.00
4: 4 24260.00
5: 5 25758.60
6: 6 26763.40
7: 7 27920.00
8: 8 28600.00
9: 9 28820.00
10: 10 32600.00
11: 12 30650.00
12: 14 32600.00
13: 15 32600.00
14: 16 37700.00
15: 17 33380.00
16: 20 36784.33
17: 23 35600.00
18: 25 33590.00
19: 30 32600.00
20: 31 33920.00
21: 35 32600.00
Given the clear nonlinearity, I'm hoping to interpolate & extrapolate (I want to fill in experience for years 0 through 40) via a local linear estimator, so I defaulted to lowess
, which gives this:
This is nice on the plot, but the raw data is missing -- R's plotting device has filled in the blanks for us. I haven't been able to find a predict
method for this function, as it seems R
is moving towards using loess
, which as I understand is a generalization.
However, when I use loess
(setting surface="direct"
to be able to extrapolate, as mentioned in ?loess
), which has a standard predict
method, the fit is less satisfactory:
(There are strong theoretical reasons to say that salary should be non-decreasing--there is some noise/possible mis-measurement driving the U shape here)
And I can't seem to be able to fiddle around with any of the parameters to get back the non-decreasing fit given by lowess
.
Any suggestions for what to do?
I don't know how to "reconcile" those two functions but I have used the
cobs
package (COnstrained B-Splines Nonparametric Regression Quantiles ) with some success for similar tasks. The default quantile is the (local) median or 0.5 quantile. In this dataset the default choices for span or kernel width seem very appropriate.It does have a predict method although you will need to use idiosyncratic arguments since it doesn't support the usual
data=
formalism: