Selecting an appropriate lag for a regression equa

2020-07-18 07:35发布

问题:

My question is two-fold.

How do I select an appropriate lag for my regression equation? I've got a dependent variable of house price, and independent variables of rent, house supply, national stock market index, mortgage rate, and house vacancy rate.

I did some reading and found that VARselect(data,lag.max=1 or 2 or 3 etc) can help me select an appropriate lag.

data is a csv file with the above variables. So the below is what I got. How am I supposed to interpret it?

> var=VARselect(data,lag.max=8)
> var
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     3      3      1      3 

$criteria
          1        2        3        4        5        6        7        8
AIC(n) 1.716881 1.575052 1.474927 1.543878 1.493210 1.651975 1.624066 1.773173
HQ(n)  1.807505 1.726093 1.686385 1.815752 1.825500 2.044682 2.077189 2.286712
SC(n)  1.962629 1.984634 2.048341 2.281125 2.394289 2.716887 2.852810 3.165750
FPE(n) 5.569664 4.841214 4.396341 4.741887 4.556023 5.424803 5.393498 6.451249

I guess, long in short, what I want to find out is - how much should I lag each of rent, house supply, national stock market index, mortgage rate, and house vacancy rate against house price to create a 'good enough' model.

I am open to other methods that help me find out what I should do but please help me out with the code. Thanks.

回答1:

Check out the documentation for the vars package, in particular for the VARselect function (same information as ?VARselect, but formatted nicely).

What the $selection object is telling you is the total lag order selected by minimizing each of the 4 criteria (Akaike, Hannan-Quinn, Schwarz, and Final Prediction Error);

What the $criteria object tells you is the value of each criteria at the given lag (so that $criteria[3L, p], for example, tells you what the Schwarz criterion was for the pth lag specification). This may be useful if there are a lot of lags that have similar criterion values, allowing you to choose a more parsimonious specification if the minimizer has p very high, but a much lower value of p gives you a similar criterion.

Please also note that if you just run VARselect(data), it will evaluate the criterion for fitting the model jointly-- I'm not sure what you're going for, but from your question it seems like you might have wanted to evaluate the lag selection process for each of the columns in your data separately. To do so you'd have to run lapply(data, VARselect).



回答2:

I believe the AIC and SC tests are the most often used in practice and AIC in particular is well documented (see: Helmut Lütkepohl, New Introduction to Multiple Time Series Analysis).

The right answer is that there is no one method that is know to give the best result - that's why they are all still in the vars package, presumably.

One way to get a good idea for your own model, would be to carry out the test above for all variables/specific subsets and then see which test of the four gives consistent values. Then take this into account with the frequency of your data (daily, weekly, monthly, yearly?) and make an educated decision. If you have monthly data, then it is likely that your factors mentioned above indeed have effects up to 6 months later e.g. house supply against house prices - as houses aren't built/vacated very quickly.

In case you aren't sure where the lag information criterion comes into the VAR model - there is an input field in the function VAR from package 'vars', where you can just type AIC, SC etc.



标签: r vector var