可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am currently dealing witha very small data set (20 observations, I know it's terrible). But I need to somehow forecast out the values. When I simply regress time on the dependent variable I am able to get a prediction, but when I add lagged or differenced variables it does not predict more than one year into the future. Is this due to having too few observations?

Here is my code for context. The two lines have have commented out result in a better fitting prediction for present data, but generate only one future prediction.

use "scrappage.dta", clear

drop if year == 1993

tsappend, add(12)

tsset year, y

reg scrappagerate year

*reg scrappagerate year l.scrappagerate l2.scrappagerate

*reg scrappagerate year d.scrappagerate d2.scrappagerate

predict p

predict yp if year>year(2013)

tsline yp p scrappagerate

Sorry if this is a stupid question, this is my first time using Stata to predict values.

回答1:

Take a look here for a solution and explanation. Essentially, you can use arima to estimate a model without AR or MA components (which should be equivalent to OLS with reg) and create the dynamic/recursive forecast:

arima y L(1/2).y, hessian
predict y_dynhat, dyn(tm(2011m2)))

Just replace 2011m2 with whatever the actual last monthly date where you observe y. The hessian option will force the standard errors to match OLS more closely.

You might consider posting your data on the stats site to see if folks have better modeling advice that OLS.

回答2:

Here's your problem:

The reason you're obtaining only one prediction has nothing to do with the predict function, but the nature of your data. Let's say you have N observations. In your case, you used tsappend, add(12), making it so you have N+12 observations. And your l1.y lagged variable will carry down to the N+1th row.

Stata's predict function will predict on all non-missing data, where there are available predictors. Therefore, since your independent variable, l1.y is populated in the N + 1 row, Stata will predict that observation. (Similarly, predict won't predict the 1st observation, since the your lagged predictor will be missing.)

Here's your solution:

In order to get dynamic prediction using OLS regression in Stata, you need to feed this N+1th prediction into an X matrix and use the regression coefficient matrix to predict the N+2 observation. You then iterate.

* Example of how to do dynamic prediction using OLS regression and lagged variables
clear
set obs 12
gen time = _n
gen y = rnormal(100,100)

tsset time
tsappend, add(12)
gen y_lag1 = l1.y

* Establish the regression relationship and save the coefficients
regress y y_lag1
matrix a = r(table)'
matrix beta = a[1..2,1]

* Predict the N+1 value (notice you have y_lag1 in the 13th row)
predict yhat

* Predict the next values
local lag = 1
forval i = 14/24 {
    local last_y = yhat[`i'-`lag']
    matrix xinput = [`last_y',1]
    * Estimate the next sales
    matrix next_y = xinput*beta
    replace yhat = next_y[1,1] in `i'
}

Comparing this to using the ARIMA model (as per Dimitriy V. Masterov's comment), and you get nearly identical results.