How do I pivot one DataFrame column to a truth tab

I have one df with a user_id and a category. I'd like to transform this to a truth table for whether or not that user has at least one entry for that category. However, the final table should also include columns for all categories that appear in 'df_list', which may not appear at all in df.

Right now I create the truth table with a groupby + size and then check if any columns are missing, and then manually set those columns to False, but I was wondering if there was a way to accomplish this in the initial groupby step.

Here's an example:

import pandas as pd
df = pd.DataFrame({'user_id': [1,1,1,2,2],
                 'category': ['A', 'B', 'D', 'A', 'F']})
df_list = pd.DataFrame({'category': ['A', 'B', 'C', 'D', 'E', 'F']})

df_truth = df.groupby(['user_id', 'category']).size().unstack(fill_value=0).astype(bool)
#category     A      B      D      F
#user_id                            
#1         True   True   True  False
#2         True  False  False   True

To then get to the desired output I then do:

missing_vals = df_list.category.unique()[~pd.Series(df_list.category.unique()).isin(df_truth.columns)]
for element in missing_vals:
    df_truth.loc[:,element] = False
#category     A      B      D      F      C      E
#user_id                                          
#1         True   True   True  False  False  False
#2         True  False  False   True  False  False

标签： python pandas dataframe group-by pandas-groupby

1条回答

▲ chillily

2楼-- · 2019-06-25 12:09

Option 1
crosstab
I'd recommend converting that column to a categorical dtype. crosstab/pivot will then handle the rest.

i = df.user_id
j = pd.Categorical(df.category, categories=df_list.category)

pd.crosstab(i, j).astype(bool)

col_0       A      B      C      D      E      F
user_id                                         
1        True   True  False   True  False  False
2        True  False  False  False  False   True

Option 2
unstack + reindex
To fix your existing code, you can simplify the second step with reindex:

(df.groupby(['user_id', 'category'])
   .size()
   .unstack(fill_value=0)
   .reindex(df_list.category, axis=1, fill_value=0)
   .astype(bool)
)

category     A      B      C      D      E      F
user_id                                          
1         True   True  False   True  False  False
2         True  False  False  False  False   True

0人赞添加讨论(0) 举报

How do I pivot one DataFrame column to a truth tab

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间