OLS Regression with groupby

2019-05-23 02:42发布

问题:

I want to run an OLS regression using pandas and a groupby.

I am trying the following code:

import pandas as pd
from pandas.stats.api import ols

df=pd.read_csv(r'F:\File.csv')
result=df.groupby(['FID']).apply(lambda x: ols(y=df[x['MEAN']], x=df[x['Accum_Prcp'],x['Accum_HDD']]))
print result

but this returns:

File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1150, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])

    KeyError: '[ 0.84978328  0.72115778  0.53965104  0.52955655  0.73372541  0.64617074\n  0.60040938  0.7147218   0.65533535  0.57980322  0.57382068  0.56543435\n  0.70740831  0.9245337   0.54859569  0.6789395   0.7086157   0.3835853\n  0.54924104  0.80813778  0.83758118  0.22673391  0.26594087  0.63650468\n  0.89889911  0.38324657  0.30235986  0.62922678  0.55219822  0.55950705\n  0.71137557  0.53631811  0.70158798  0.87116361  0.93751381  0.91125518\n  0.80020908  0.75301262  0.82391046  0.77483673  0.63069573  0.44954455\n  0.83578862  0.56338649  0.64236039  0.93270243  0.93077291  0.83847668\n  0.8268959   0.85400317  0.74319769  0.94803537  0.97484929  0.45366017\n  0.80823694  0.82028051  0.63960395  0.63015722  0.73132888  0.55570184\n  0.83265402  0.75009687  0.58207032  0.92064804  0.91058008  0.86726397\n  0.89204098  0.95573514  0.75704367  0.80786363  0.87448548  0.7553715\n  0.88965962  0.82828493  0.82423891  0.81034742  0.90104876  0.78875473\n  0.97369268] not in index'

is there something with my syntax that is incorrect?

to do this without a groupby would be something like this:

result = ols(y=df['MEAN'], x=df[['Accum_HDD','Accum_Prcp']])

and that works correctly.

My dataframe looks like something like this:

FID  Image_Date   MEAN  Accum_Prcp   Accum_HDD
1     19920506     2.0   500.0        1000.0
1     19930506     1.7   450.0        1050.0
2     19920506     2.7   456.0        992.0
2     19930506     1.9   376.0        800.0 

回答1:

Try:

grps=df.groupby(['FID'])
for fid, grp in grps:
    ols(y=grp.loc[:, 'MEAN'], x=grp.loc[:, ['Accum_Prcp', 'Accum_HDD']])