Here is a pandas DataFrame I would like to manipulate:
import pandas as pd
data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2", ...],
"labels": ["A", "B", "C", "A", "B", "C", "D", ...],
"count": [5, 1, 8, 3, 731, 189, 9, ...]}
df = pd.DataFrame(data)
print(df)
>>> grouping labels count
0 item1 A 5
1 item1 B 1
2 item1 C 8
3 item2 A 3
4 item2 B 731
5 item2 C 189
6 item2 D 9
7 ... ... ....
I would like to "unfold" this dataframe into the following format:
grouping A B C D
item1 5 1 8 3
item2 3 731 189 9
.... ........
How would one do this? I would think that this would work:
pd.pivot_table(df,index=["grouping", "labels"]
but I get the following error:
DataError: No numeric types to aggregate
You put
labels
in the index, but you want it in the columns:Note that this makes the columns a MultiIndex. If you don't want that, explicitly pass
values
:df.pivot_table(index='grouping', columns='labels', values='count')
.Also, note that the kind of reshape you seem to be looking for will only be possible if each combination of grouping and label has exactly one or zero values. If any combination occurs more than once, you need to decide how to aggregate them (e.g., by summing the matching values).
Use
set_index
andunstack
:Output:
Try:
There are four idiomatic
pandas
ways to do this.pivot
set_index
pivot_table
groupby
pivot
set_index
pivot_table
groupby
All yield
timing
With the
groupby
,set_index
, orpivot_table
approach, you can easily fill in missing values withfill_value=0
All yield
Additional thoughts on
groupby
Because we don't require any aggregation. If we wanted to use
groupby
, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator.or
timing
groupby