Prediction using Fixed Effects

2019-07-23 18:09发布

问题:

I have a simple data set for which I applied a simple linear regression model. Now I would like to use fixed effects to make a better prediction on the model. I know that I could also consider making dummy variables, but my real dataset consist of more years and has more variables so I would like to avoid making dummies.

My data and code is similar to this:

data <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="CompanyNumber ResponseVariable Year ExplanatoryVariable1 ExplanatoryVariable2
                   1 2.5 2000 1 2
                   1 4 2001 3 1
                   1 3 2002 5 7
                   2 1 2000 3 2
                   2 2.4 2001 0 4
                   2 6 2002 2 9
                   3 10 2000 8 3")

library(lfe)
library(caret)
fe <- getfe(felm(data = data, ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2 | Year))
fe
lm.1<-lm(ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2, data=data)                                   


prediction<- predict(lm.1, data) 
prediction

check_model=postResample(pred = prediction, obs = data$ResponseVariable)
check_model

For my real dataset I will make a prediction based on my test set but for simplicity I just use the trainingset here as well.

I would like to make a prediction with the help of the fixed effects that I found. But it does not seem to match the fixed effect right, anyone who knows how to use this fe$effects?

prediction_fe<- predict(lm.1, data) + fe$effect

回答1:

Here's a few extra comments on your setup and the models that you are running.

The primary model you are fitting is

lm.1<-lm(ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2, data=data) 

which yields

> lm.1
Call:
lm(formula = ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2, 
    data = data)

Coefficients:
         (Intercept)  ExplanatoryVariable1  ExplanatoryVariable2  
              0.8901                0.7857                0.1923  

When you run the predict function on this model you get

> predict(lm.1)
       1        2        3        4        5        6        7 
2.060385 3.439410 6.164590 3.631718 1.659333 4.192205 7.752359 

That corresponds to computing (for observation 1) : 0.8901 + 1*0.7857 + 2*0.1923 so the estimated fixed effects are used in the prediction. The felm model is slightly more complicated as it "factors out" the year component. The model fit is shown here

> felm(data = data, ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2 | Year)
ExplanatoryVariable1 ExplanatoryVariable2 
              0.9726               1.3262 

Now this correspond to "correcting for" or conditioning on Year so you get the same result if you fit

> lm(data = data, ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2 + factor(Year))

Call:
lm(formula = ResponseVariable ~ ExplanatoryVariable1 + ExplanatoryVariable2 + 
    factor(Year), data = data)

Coefficients:
         (Intercept)  ExplanatoryVariable1  ExplanatoryVariable2      factor(Year)2001  
             -2.4848                0.9726                1.3262                0.9105  
    factor(Year)2002  
             -7.0286  

and then just throw away all but the coefficients for the explanatory variables. Thus, you cannnot go from the extracted fixed effects from felm and obtain the predictions (since you are lacking the intercept and all the year effects) - you can only see the effect sizes.

Hope this helps.