I wrote the following piece of code but I just cannot get the 'predict' method to work:
import statsmodels.api as sm
from statsmodels.formula.api import ols
ols_model = ols('Consumption ~ Disposable_Income', df).fit()
My 'df' is a pandas dataframe with column headings 'Consumption' and 'Disposable_Income'. When I run, for example,
ols_model.predict([1000.0])
I get: "TypeError: list indices must be integers, not str"
When I run, for example,
ols_model.predict(df['Disposable_Income'].values)
I get: "IndexError: only integers, slices (:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices"
I'm very confused because I thought these two formats are precisely what the documentation says - put in an array of values for the x variable. How exactly am I supposed to use the 'predict' method?
This is how my df look:
Since you work with the formulas in the model, the formula information will also be used in the interpretation of the exog in predict
.
I think you need to use a dataframe or a dictionary with the correct name of the explanatory variable(s).
ols_model.predict({'Disposable_Income':[1000.0]})
or something like
df_predict = pd.DataFrame([[1000.0]], columns=['Disposable_Income'])
ols_model.predict(df_predict)
Another option is to avoid formula handling in predict if the full design matrix for prediction, including constant, is available
AFAIR, this should also work:
ols_model.predict([[1, 1000.0]], transform=False)
Not sure if this is the best approach, but after lots and lots of fiddling around, I got this code to work (seems abit clumsy and inefficient):
Say I want to predict the value at X=10 and X=1000:
import statsmodels.api as sm
from statsmodels.formula.api import ols
ols_model = ols('Consumption ~ Disposable_Income', df).fit()
regressor = ols('Consumption ~ Disposable_Income', df)
regressor.predict(ols_model.params, exog=[[1,10],[1,1000]])