Why there is an extra index when using apply in Pa

2019-07-05 05:37发布

问题:

When I use apply to a user defined function in Pandas, it looks like python is creating an additional array. How could I get rid of it? Here is my code:

def fnc(group):
    x = group.C.values
    out = x[np.where(x < 0)]
    return pd.DataFrame(out)

data = pd.DataFrame({'A':np.random.randint(1, 3, 10),
                     'B':3,
                     'C':np.random.normal(0, 1, 10)})

data.groupby(by=['A', 'B']).apply(fnc).reset_index()

There is this weird Level_2 index created. Is there a way to avoid creating it when running my function?

    A   B   level_2   0
0   1   3   0        -1.054134802
1   1   3   1        -0.691996447
2   2   3   0        -1.068693768
3   2   3   1        -0.080342046
4   2   3   2        -0.181869799

回答1:

As such, you will have no way to avoid level_2 appearing. This is because the result of your grouping is a dataframe with several items in it: pandas is cool enough to understand your wish is to broadcast these items across the grouped keys, yet it is taking the index of the dataframe as an additional level to guarantee coherent output data. So dropping level=-1 at the end of your processing explicitly is expected.

If you want to avoid to reset that extra index, but still have some post processing, another way would be to call transform instead of apply, and get the returned data from fnc being the entire group vector where you put np.nan for results to exclude. Then, your dataframe will not have an extra level, but you'll need to call dropna() afterwards.