I want to run an OLS regression using pandas and a groupby.
I am trying the following code:
import pandas as pd
from pandas.stats.api import ols
df=pd.read_csv(r'F:\File.csv')
result=df.groupby(['FID']).apply(lambda x: ols(y=df[x['MEAN']], x=df[x['Accum_Prcp'],x['Accum_HDD']]))
print result
but this returns:
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1150, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: '[ 0.84978328 0.72115778 0.53965104 0.52955655 0.73372541 0.64617074\n 0.60040938 0.7147218 0.65533535 0.57980322 0.57382068 0.56543435\n 0.70740831 0.9245337 0.54859569 0.6789395 0.7086157 0.3835853\n 0.54924104 0.80813778 0.83758118 0.22673391 0.26594087 0.63650468\n 0.89889911 0.38324657 0.30235986 0.62922678 0.55219822 0.55950705\n 0.71137557 0.53631811 0.70158798 0.87116361 0.93751381 0.91125518\n 0.80020908 0.75301262 0.82391046 0.77483673 0.63069573 0.44954455\n 0.83578862 0.56338649 0.64236039 0.93270243 0.93077291 0.83847668\n 0.8268959 0.85400317 0.74319769 0.94803537 0.97484929 0.45366017\n 0.80823694 0.82028051 0.63960395 0.63015722 0.73132888 0.55570184\n 0.83265402 0.75009687 0.58207032 0.92064804 0.91058008 0.86726397\n 0.89204098 0.95573514 0.75704367 0.80786363 0.87448548 0.7553715\n 0.88965962 0.82828493 0.82423891 0.81034742 0.90104876 0.78875473\n 0.97369268] not in index'
is there something with my syntax that is incorrect?
to do this without a groupby would be something like this:
result = ols(y=df['MEAN'], x=df[['Accum_HDD','Accum_Prcp']])
and that works correctly.
My dataframe looks like something like this:
FID Image_Date MEAN Accum_Prcp Accum_HDD
1 19920506 2.0 500.0 1000.0
1 19930506 1.7 450.0 1050.0
2 19920506 2.7 456.0 992.0
2 19930506 1.9 376.0 800.0