I am running OLS on products by month. While this works fine for a single product, my dataframe contains many products. If I create a groupby object OLS gives an error.
linear_regression_df:
product_desc period_num TOTALS
0 product_a 1 53
3 product_a 2 52
6 product_a 3 50
1 product_b 1 44
4 product_b 2 43
7 product_b 3 41
2 product_c 1 36
5 product_c 2 35
8 product_c 3 34
from pandas import DataFrame, Series
import statsmodels.api as sm
linear_regression_grouped = linear_regression_df.groupby(['product_desc'])
X = linear_regression_grouped['period_num']
y = linear_regression_grouped['TOTALS']
model = sm.OLS(y, X)
results = model.fit()
And I get this error on the sm.OLS() line:
ValueError: unrecognized data structures: <class 'pandas.core.groupby.SeriesGroupBy'>
So how can I go through my dataframe and apply sm.OLS() for each product_desc?
You could do something like this ...
Use
get_group
to get each individual group and perform OLS model on each one:But in real case, you also want to have the intercept term so the model should be defined slightly differently:
The results (with intercept and without) are, certainly, very different.