convert list of strings to dummy variables with pa

2020-06-25 09:07发布

if I have the following panda DataFrame:

pd.DataFrame(columns=['name', 'tags'], data=[
    ['Rob', ['a', 'c']],
    ['Erica', ['b', 'c']]
])

table:

Name   tags
Rob    ['a', 'c']
Erica  ['b', 'c']

How would I convert this into:

Name   tags_a  tags_b  tags_c
Rob    1       0       1
Erica  0       1       1

If each row could only have 1 tag I could do this with pd.get_dummies(df, columns=['tags']) but this doesn't work when tags is a List.

标签: python pandas
3条回答
2楼-- · 2020-06-25 09:44
#use apply to transform tags to separate tags
df.apply(lambda x: [x['name']] + np.in1d(('a','b','c'),x.tags).astype(int).tolist() ,axis=1).apply(pd.Series)

#rename columns
df2.columns=['name', 'tags_a', 'tags_b', 'tags_c']

df2
Out[505]: 
    name  tags_a  tags_b  tags_c
0    Rob       1       0       1
1  Erica       0       1       1
查看更多
等我变得足够好
3楼-- · 2020-06-25 09:56

str.get_dummies

df.tags.str.join('|').str.get_dummies().add_prefix('tags_')

   tags_a  tags_b  tags_c
0       1       0       1
1       0       1       1

include join

df[['name']].join(df.tags.str.join('|').str.get_dummies().add_prefix('tags_'))

    name  tags_a  tags_b  tags_c
0    Rob       1       0       1
1  Erica       0       1       1
查看更多
乱世女痞
4楼-- · 2020-06-25 10:02
# reorganize data
df = pd.get_dummies(df.set_index('name').tags
                      .apply(pd.Series)
                      .stack()
                   ).unstack()

# remove multilevel column and collapse counts per name
df.columns = df.columns.droplevel(1)
df.groupby(by=df.columns, axis=1).sum().add_prefix('tags_')

       tags_a  tags_b  tags_c
name                         
Rob         1       0       1
Erica       0       1       1
查看更多
登录 后发表回答