Python Statsmodels: OLS regressor not predicting

2019-07-11 03:24发布

问题:

I wrote the following piece of code but I just cannot get the 'predict' method to work:

import statsmodels.api as sm
from statsmodels.formula.api import ols
ols_model = ols('Consumption ~ Disposable_Income', df).fit()

My 'df' is a pandas dataframe with column headings 'Consumption' and 'Disposable_Income'. When I run, for example,

ols_model.predict([1000.0])

I get: "TypeError: list indices must be integers, not str"

When I run, for example,

ols_model.predict(df['Disposable_Income'].values)

I get: "IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices"

I'm very confused because I thought these two formats are precisely what the documentation says - put in an array of values for the x variable. How exactly am I supposed to use the 'predict' method?

This is how my df look:

回答1:

Since you work with the formulas in the model, the formula information will also be used in the interpretation of the exog in predict.

I think you need to use a dataframe or a dictionary with the correct name of the explanatory variable(s).

ols_model.predict({'Disposable_Income':[1000.0]})

or something like

df_predict = pd.DataFrame([[1000.0]], columns=['Disposable_Income'])
ols_model.predict(df_predict)

Another option is to avoid formula handling in predict if the full design matrix for prediction, including constant, is available

AFAIR, this should also work:

ols_model.predict([[1, 1000.0]], transform=False)



回答2:

Not sure if this is the best approach, but after lots and lots of fiddling around, I got this code to work (seems abit clumsy and inefficient):

Say I want to predict the value at X=10 and X=1000:

import statsmodels.api as sm
from statsmodels.formula.api import ols
ols_model = ols('Consumption ~ Disposable_Income', df).fit()
regressor = ols('Consumption ~ Disposable_Income', df)
regressor.predict(ols_model.params, exog=[[1,10],[1,1000]])