Using groupy and subplots with pandas dataframe

2019-09-10 11:30发布

问题:

I have a dataframe object with time series data in multiple columns (see below). I am trying to make a graphic with subplots for each of the columns in the dataframe where each subplot has 12 boxplots, one for each month.

I have used the following code to just make subplots from a dataframe before (but for bar not boxplots),

labels = df.columns.values
fig, axes = plt.subplots(nrows = 3, ncols = 4, gridspec_kw =  dict(hspace=0.3),figsize=(12,9), sharex = True, sharey=True)
targets = zip(labels, axes.flatten())
for i, (col,ax) in enumerate(targets):
    pd.DataFrame(df[col]).plot(kind='bar', ax=ax, color = 'green')

but it does not work as is when I use the groupby object in place of dataframe

grouped = df.groupby(df.index.month)
labels = df.columns.values
fig, axes = plt.subplots(nrows = 3, ncols = 4)
targets = zip(labels, axes.flatten())
for i, (col,ax) in enumerate(targets):
    grouped[col].boxplot(ax=ax, color = 'green', subplots =False)

The problem is that boxplot cannot be called on a 'SeriesGroupBy'

But even if I use df.plot.box(by = df.index.month or df.boxplot(by = df.index.month) directly in the plot loop (in place of making the grouped object separately, first) the grouping doesn't seem to be recognized.

Does any one have suggestions? Thanks!

EDIT Example data:

               res01      res02      res03     res04      res05      res06
1981-01-31 -16.571927  -4.051575  -8.865433 -0.858423  41.831455 -14.569453   
1981-02-28 -14.672908  -2.004894  -6.151469 -0.448101 -30.476155 -13.572198   
1981-03-31 -10.588504  -1.079251  -3.057215 -0.897639 -19.407469  -6.936018   
1981-04-30 -18.132814  -1.438858   0.028866  0.388591 -24.435158  -8.880159   
1981-05-31  -8.190266  -2.175105  -4.326701 -1.089722 -13.286928 -13.530322   
1981-06-30  -7.857190  -2.861348  -5.046409 -0.013585 -17.134277 -18.153491   
1981-07-31  -0.882391  -4.497572  -9.914211 -1.115400 -27.628329 -33.412025   
1981-08-31  12.876021  -4.969259 -11.849937 -1.205588 -29.825922 -36.093600   
1981-09-30 -43.434015  -8.681070 -14.143496 -4.701924 -32.357578 -25.945754   
1981-10-31  38.656449   3.055204   3.088694  1.425666  12.881002  -7.261655   
1981-11-30  -3.455937  -2.136963  -4.393510  0.472263  10.560834 -11.224297   
1981-12-31  -2.923868  -2.006733  -1.667986 -0.460742  -8.663085 -12.022059   
1982-01-31  19.625548  -2.127550  -4.044511 -0.447382  27.524403  -8.551865   
1982-02-28 -12.424200  -1.931246  -6.055349 -0.448398 -29.979264 -13.166926   
1982-03-31  35.249772  -2.416680  -6.029210 -0.661215 -47.206552 -24.267880   
1982-04-30 -55.008877  -7.160744  -9.331341 -1.040474 -42.029073 -32.618620   
1982-05-31 -17.349030  -3.067463  -6.511664 -0.892260 -40.803273 -29.355429   
1982-06-30  -5.710025  -2.519162 -15.885825 -1.664557 -36.476341 -43.840351   
1982-07-31 -30.790685  -8.042895 -12.381517 -1.339010 -38.542642 -53.612233   
1982-08-31   4.263036   1.270455 -13.225027 -1.431894 -29.160338 -36.575128   
1982-09-30 -17.206044 -14.336086 -13.276423 -1.316164 -32.316961 -43.796818   
1982-10-31  -5.164960  -6.247522 -12.369959 -1.045498  12.716187 -29.489328   
1982-11-30 -25.543948  -2.648465  -5.598642 -0.554379  12.033847 -12.507718   
1982-12-31  -2.971802  -1.982072  -1.225803 -0.335575  -7.452425 -10.182204   
1983-01-31  29.917477  -3.224031  -7.680435 -0.701457  43.068696 -11.812835   
1983-02-28   4.998955  -3.281333 -12.630952 -0.867328 -47.758882 -30.902821   
1983-03-31 -21.483914  -3.219957  -7.321552 -0.756839 -50.798885 -29.858194   
1983-04-30 -23.288018  -2.411159  -5.212307 -0.626141 -49.477692 -22.813129   
1983-05-31   0.317828  -3.181573  -6.915676 -0.855810 -21.701865 -23.165239   
1983-06-30 -23.914567  -7.788987 -18.696691 -2.082176 -35.968441 -50.015002   
1983-07-31 -21.452370  -6.447321 -14.399266 -1.514856 -35.645412 -49.081801   
1983-08-31 -14.721837  -7.266818 -14.439923 -1.499819 -47.237557 -52.978016   
1983-09-30 -18.532760  -3.905781  -7.398113 -0.729630 -16.512127 -23.390976   
1983-10-31  62.864704  -5.903833 -13.910222 -1.143347  21.336868 -26.468803   
1983-11-30 -11.050188  -5.180171 -12.654286 -1.186503  24.885744 -22.581720   
1983-12-31  -9.576725  -6.114298  -7.761357 -1.048323 -23.590444 -37.646843   

回答1:

AFAIK, if you group your DF you have either apply some aggregate (reducing) function or call .groups which would return you a dict with group keys and corresponding indexes for each key. So if you just want to plot 12 subplots

IIUC you may try to do it this way (using seaborn module):

ax = sns.boxplot(data=df, x=df.index.month, y='res01')

subplots:

labels = df.columns.values
fig, axes = plt.subplots(nrows = 3, ncols = 4, gridspec_kw =  dict(hspace=0.3),figsize=(12,9), sharex = True, sharey=True)
targets = zip(labels, axes.flatten())
for i, (col,ax) in enumerate(targets):
    sns.boxplot(data=df, ax=ax, color='green', x=df.index.month, y=col)

PS i'm not sure though that i correctly understood your goal



回答2:

If seaborn is not an option, creating an additional "month" column and grouping by that will work:

df['month'] = df.index.month
df.boxplot(by='month', figsize=(12,8));