pandas: pivoting on rank

2019-08-15 01:58发布

Given this data:

pd.DataFrame({'id':['aaa','aaa','abb','abb','abb','acd','acd','acd'],
              'loc':['US','UK','FR','US','IN','US','CN','CN']})

    id loc
0  aaa  US
1  aaa  UK
2  abb  FR
3  abb  US
4  abb  IN
5  acd  US
6  acd  CN
7  acd  CN

How do I pivot it to this:

 id   loc1   loc2   loc3
aaa    US     UK     None
abb    FR     US      IN
acd    US     CN      CN

I am looking for the most idiomatic method.

1条回答
迷人小祖宗
2楼-- · 2019-08-15 02:14

I think you can create new column cols with groupby, cumcount and convert to string by astype, last use pivot:

df['cols'] = 'loc' + (df.groupby('id')['id'].cumcount() + 1).astype(str)
print df
    id loc  cols
0  aaa  US  loc1
1  aaa  UK  loc2
2  abb  FR  loc1
3  abb  US  loc2
4  abb  IN  loc3
5  acd  US  loc1
6  acd  CN  loc2
7  acd  CN  loc3

print df.pivot(index='id', columns='cols', values='loc')
cols loc1 loc2  loc3
id                  
aaa    US   UK  None
abb    FR   US    IN
acd    US   CN    CN

If you want remove index and columns names use rename_axis:

print df.pivot(index='id', columns='cols', values='loc').rename_axis(None)
                                                        .rename_axis(None, axis=1)
    loc1 loc2  loc3
aaa   US   UK  None
abb   FR   US    IN
acd   US   CN    CN

All together, thank you Colin:

print pd.pivot(df['id'], 'loc' + (df.groupby('id').cumcount() + 1).astype(str), df['loc'])
        .rename_axis(None)
        .rename_axis(None, axis=1)

    loc1 loc2  loc3
aaa   US   UK  None
abb   FR   US    IN
acd   US   CN    CN    

I try rank, but I get error in version 0.18.0:

print df.groupby('id')['loc'].transform(lambda x: x.rank(method='first'))
#ValueError: first not supported for non-numeric data
查看更多
登录 后发表回答