Python Pandas: Multiple aggregations of the same c

2019-01-08 12:06发布

问题:

Given the following (totally overkill) data frame example

df = pandas.DataFrame({
                       "date":[datetime.date(2012,x,1) for x in range(1,11)], 
                       "returns":0.05*np.random.randn(10), 
                       "dummy":np.repeat(1,10) 
                      })

is there an existing built-in way to apply two different aggregating functions to the same column, without having to call agg multiple times?

The syntactically wrong, but intuitively right, way to do it would be:

# Assume `function1` and `function2` are defined for aggregating.
df.groupby("dummy").agg({"returns":function1, "returns":function2})

Obviously, Python doesn't allow duplicate keys. Is there any other manner for expressing the input to agg? Perhaps a list of tuples [(column, function)] would work better, to allow multiple functions applied to the same column? But it seems like it only accepts a dictionary.

Is there a workaround for this besides defining an auxiliary function that just applies both of the functions inside of it? (How would this work with aggregation anyway?)

回答1:

You can simply pass the functions as a list:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})
Out[20]: 
        returns          
            sum      mean

dummy                    
1      0.285833  0.028583

or as a dictionary:

In [21]: df.groupby('dummy').agg({'returns':
                                  {'Mean': np.mean, 'Sum': np.sum}})
Out[21]: 
        returns          
            Sum      Mean
dummy                    
1      0.285833  0.028583


回答2:

Would something like this work:

In [7]: df.groupby('dummy').returns.agg({'func1' : lambda x: x.sum(), 'func2' : lambda x: x.prod()})
Out[7]: 
              func2     func1
dummy                        
1     -4.263768e-16 -0.188565