I'm working in pandas doing pivot tables and when doing the groupby (to count distinct observations)
aggfunc={"person":{lambda x: len(x.unique())}}
gives me the following error:
'DataFrame' object has no attribute 'unique'
any ideas how to fix it?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
DataFrames do not have that method; columns in DataFrames do:
df['A'].unique()
Or, to get the names with the number of observations (using the DataFrame given by closedloop):
>>> df.groupby('person').person.count()
Out[80]:
person
0 2
1 3
Name: person, dtype: int64
回答2:
Rather than removing duplicates during the pivot table process, use the df.drop_duplicates()
function to selectively drop duplicates.
For example if you are pivoting using these index='c0'
and columns='c1'
then this simple step yields the correct counts.
In this example the 5th row is a duplicate of the 4th (ignoring the non-pivoted c2
column
import pandas as pd
data = {'c0':[0,1,0,1,1], 'c1':[0,0,1,1,1], 'person':[0,0,1,1,1], 'c_other':[1,2,3,4,5]}
df = pd.DataFrame(data)
df2 = df.drop_duplicates(subset=['c0','c1','person'])
pd.pivot_table(df2, index='c0',columns='c1',values='person', aggfunc='count')
This correctly outputs
c1 0 1
c0
0 1 1
1 1 1