Predict() - Maybe I'm not understanding it

2018-12-31 08:25发布

I posted earlier today about an error I was getting with using the predict function. I was able to get that corrected, and thought I was on the right path.

I have a number of observations (actuals) and I have a few data points that I want to extrapolate or predict. I used lm to create a model, then I tried to use predict with the actual value that will serve as the predictor input.

This code is all repeated from my previous post, but here it is:

df <- read.table(text = '
     Quarter Coupon      Total
1   "Dec 06"  25027.072  132450574
2   "Dec 07"  76386.820  194154767
3   "Dec 08"  79622.147  221571135
4   "Dec 09"  74114.416  205880072
5   "Dec 10"  70993.058  188666980
6   "Jun 06"  12048.162  139137919
7   "Jun 07"  46889.369  165276325
8   "Jun 08"  84732.537  207074374
9   "Jun 09"  83240.084  221945162
10  "Jun 10"  81970.143  236954249
11  "Mar 06"   3451.248  116811392
12  "Mar 07"  34201.197  155190418
13  "Mar 08"  73232.900  212492488
14  "Mar 09"  70644.948  203663201
15  "Mar 10"  72314.945  203427892
16  "Mar 11"  88708.663  214061240
17  "Sep 06"  15027.252  121285335
18  "Sep 07"  60228.793  195428991
19  "Sep 08"  85507.062  257651399
20  "Sep 09"  77763.365  215048147
21  "Sep 10"  62259.691  168862119', header=TRUE)

str(df)
'data.frame':   21 obs. of  3 variables:
 $ Quarter   : Factor w/ 24 levels "Dec 06","Dec 07",..: 1 2 3 4 5 7 8 9 10 11 ...
 $ Coupon: num  25027 76387 79622 74114 70993 ...
 $ Total: num  132450574 194154767 221571135 205880072 188666980 ...

Code:

model <- lm(df$Total ~ df$Coupon, data=df)

> model

Call:
lm(formula = df$Total ~ df$Coupon)

Coefficients:
(Intercept)    df$Coupon  
  107286259         1349 

Predict code (based on previous help):

(These are the predictor values I want to use to get the predicted value)

Quarter = c("Jun 11", "Sep 11", "Dec 11")
Total = c(79037022, 83100656, 104299800)
Coupon = data.frame(Quarter, Total)

Coupon$estimate <- predict(model, newdate = Coupon$Total)

Now, when I run that, I get this error message:

Error in `$<-.data.frame`(`*tmp*`, "estimate", value = c(60980.3823396919,  : 
  replacement has 21 rows, data has 3

My original data frame that I used to build the model had 21 observations in it. I am now trying to predict 3 values based on the model.

I either don't truly understand this function, or have an error in my code.

Help would be appreciated.

Thanks

标签: r lm predict
4条回答
泪湿衣
2楼-- · 2018-12-31 08:59

instead of newdata you are using newdate in your predict code, verify once. and just use Coupon$estimate <- predict(model, Coupon) It will work.

查看更多
残风、尘缘若梦
3楼-- · 2018-12-31 09:05

First, you want to use

model <- lm(Total ~ Coupon, data=df)

not model <-lm(df$Total ~ df$Coupon, data=df).

Second, by saying lm(Total ~ Coupon), you are fitting a model that uses Total as the response variable, with Coupon as the predictor. That is, your model is of the form Total = a + b*Coupon, with a and b the coefficients to be estimated. Note that the response goes on the left side of the ~, and the predictor(s) on the right.

Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon, not Total.

Third, judging by your specification of newdata, it looks like you're actually after a model to fit Coupon as a function of Total, not the other way around. To do this:

model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)
查看更多
看淡一切
4楼-- · 2018-12-31 09:13

Thanks Hong, that was exactly the problem I was running into. The error you get suggests that the number of rows is wrong, but the problem is actually that the model has been trained using a command that ends up with the wrong names for parameters.

This is really a critical detail that is entirely non-obvious for lm and so on. Some of the tutorial make reference to doing lines like lm(olive$Area@olive$Palmitic) - ending up with variable names of olive$Area NOT Area, so creating an entry using anewdata<-data.frame(Palmitic=2) can't then be used. If you use lm(Area@Palmitic,data=olive) then the variable names are right and prediction works.

The real problem is that the error message does not indicate the problem at all:

Warning message: 'anewdata' had 1 rows but variable(s) found to have X rows

查看更多
残风、尘缘若梦
5楼-- · 2018-12-31 09:22

To avoid error, an important point about the new dataset is the name of independent variable. It must be the same as reported in the model. Another way is to nest the two function without creating a new dataset

model <- lm(Coupon ~ Total, data=df)
predict(model, data.frame(Total=c(79037022, 83100656, 104299800)))
查看更多
登录 后发表回答