Pivot a pandas DataFrame to be the correct format:

Here is a pandas DataFrame I would like to manipulate:

import pandas as pd

data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2", ...],
        "labels": ["A", "B", "C", "A", "B", "C", "D", ...],
        "count": [5, 1, 8, 3, 731, 189, 9, ...]}

df = pd.DataFrame(data)

print(df)
>>>   grouping            labels       count
0        item1             A            5
1        item1             B            1
2        item1             C            8
3        item2             A            3
4        item2             B          731
5        item2             C          189
6        item2             D            9
7        ...               ...         ....

I would like to "unfold" this dataframe into the following format:

grouping    A    B    C    D
item1       5    1    8    3
item2       3    731  189  9
....        ........

How would one do this? I would think that this would work:

pd.pivot_table(df,index=["grouping", "labels"]

but I get the following error:

DataError: No numeric types to aggregate

标签： python pandas dataframe pivot

4条回答

倾城　Initia

2楼-- · 2019-01-28 11:47

You put labels in the index, but you want it in the columns:

>>> df.pivot_table(index='grouping', columns='labels')
         count                   
labels       A      B      C    D
grouping                         
item1      5.0    1.0    8.0  NaN
item2      3.0  731.0  189.0  9.0

Note that this makes the columns a MultiIndex. If you don't want that, explicitly pass values: df.pivot_table(index='grouping', columns='labels', values='count').

Also, note that the kind of reshape you seem to be looking for will only be possible if each combination of grouping and label has exactly one or zero values. If any combination occurs more than once, you need to decide how to aggregate them (e.g., by summing the matching values).

0人赞添加讨论(0) 举报

乱世女痞

3楼-- · 2019-01-28 11:57

Use set_index and unstack:

df = df.set_index(['grouping','labels']).unstack().rename_axis(None)
df.columns = df.columns.droplevel()
print(df)

Output:

labels  A    B    C     D
item1   5    1    8  None
item2   3  731  189     9

0人赞添加讨论(0) 举报

够拽才男人

4楼-- · 2019-01-28 12:04

Try:

In [1]: import pandas as pd
   ...: 
   ...: data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2"],
   ...:         "labels": ["A", "B", "C", "A", "B", "C", "D"],
   ...:         "count": [5, 1, 8, 3, 731, 189, 9]}
   ...: 
In [2]: df = pd.DataFrame(data)
In [3]: df.pivot_table(index="grouping",columns="labels")

Out[3]: 
             count              
    labels       A    B    C   D
    grouping                    
    item1        5    1    8 NaN
    item2        3  731  189   9

0人赞添加讨论(0) 举报

一夜七次

5楼-- · 2019-01-28 12:05

There are four idiomatic pandas ways to do this.

No duplicates among grouping columns. Does not require aggregation
- pivot
- set_index
Duplicates among grouping columns. Does require aggregation
- pivot_table
- groupby

pivot

df.pivot('grouping', 'labels', 'count')

set_index

df.set_index(['grouping', 'labels'])['count'].unstack()

pivot_table

df.pivot_table('count', 'grouping', 'labels')

groupby

df.groupby(['grouping', 'labels'])['count'].sum().unstack()

All yield

labels      A      B      C    D
grouping                        
item1     5.0    1.0    8.0  NaN
item2     3.0  731.0  189.0  9.0

timing

With the groupby, set_index, or pivot_table approach, you can easily fill in missing values with fill_value=0

df.pivot_table('count', 'grouping', 'labels', fill_value=0)

df.groupby(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)

df.set_index(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)

All yield

labels    A    B    C  D
grouping                
item1     5    1    8  0
item2     3  731  189  9

Additional thoughts on groupby

Because we don't require any aggregation. If we wanted to use groupby, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator.

df.groupby(['grouping', 'labels'])['count'].max().unstack()

df.groupby(['grouping', 'labels'])['count'].first().unstack()

timing groupby

0人赞添加讨论(0) 举报

Pivot a pandas DataFrame to be the correct format:

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间