Calling Functions with Multiple Arguments when usi

2020-07-18 04:49发布

When writing functions to be used with groupby.apply or groupby.transform in pandas if the functions have multiple arguments, then when calling the function as part of groupby the arguments follow a comma rather than in parentheses. An example would be:

def Transfunc(df, arg1, arg2, arg2):
     return something

GroupedData.transform(Transfunc, arg1, arg2, arg3)

Where the df argument is passed automatically as the first argument.

However, the same syntax does not seem to be possible when using a function to group the data. Take the following example:

people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.ix[2:3, ['b', 'c']] = NA

def MeanPosition(Ind, df, Column):
    if df[Column][Ind] >= np.mean(df[Column]):
        return 'Greater Group'
    else:
        return 'Lesser Group'
# This function compares each data point in column 'a' to the mean of column 'a' and return a group name based on whether it is greater than or less than the mean

people.groupby(lambda x: MeanPosition(x, people, 'a')).mean()

The above works just fine, but I can't understand why I have to wrap the function in a lambda. Based upon the syntax used with transform and apply it seems to me that the following should work just fine:

people.groupby(MeanPosition, people, 'a').mean()

Can anyone tell me why, or how I can call the function without wrapping it in a lambda?

Thanks

EDIT: I do not think it is possible to group the data by passing a function as the key without wrapping that function in a lambda. One possible workaround is to rather than passing a function as the key, pass an array that has been created by a function. This would work in the following manner:

def MeanPositionList(df, Column):
    return ['Greater Group' if df[Column][row] >= np.mean(df[Column]) else 'Lesser Group' for row in df.index]

Grouped = people.groupby(np.array(MeanPositionList(people, 'a')))
Grouped.mean()

But then of course it could be better just to cut out the middle man function all together and simply use an array with list comprhension....

1条回答
对你真心纯属浪费
2楼-- · 2020-07-18 05:33

Passing arguments to apply just happens to work, because apply passes on all arguments to the target function.

However, groupby takes multiple arguments, see here, so its not possible to differentiate between arguments; passing a lambda / named function is more explicit and the way to go.

Here is how to do what I think you want (slightly modified as you have all distinct groups in your example)

In [22]: def f(x):
   ....:     result = Series('Greater',index=x.index)
   ....:     result[x<x.mean()] = 'Lesser'
   ....:     return result
   ....: 

In [25]: df = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Joe', 'Wes', 'Wes', 'Travis'])

In [26]: df
Out[26]: 
               a         b         c         d         e
Joe    -0.293926  1.006531  0.289749 -0.186993 -0.009843
Joe    -0.228721 -0.071503  0.293486  1.126972 -0.808444
Wes     0.022887 -1.813960  1.195457  0.216040  0.287745
Wes    -1.520738 -0.303487  0.484829  1.644879  1.253210
Travis -0.061281 -0.517140  0.504645 -1.844633  0.683103

In [27]: df.groupby(df.index.values).transform(f)
Out[27]: 
              a        b        c        d        e
Joe      Lesser  Greater   Lesser   Lesser  Greater
Joe     Greater   Lesser  Greater  Greater   Lesser
Travis  Greater  Greater  Greater  Greater  Greater
Wes     Greater   Lesser  Greater   Lesser   Lesser
Wes      Lesser  Greater   Lesser  Greater  Greater
查看更多
登录 后发表回答