Search Pandas Column for Substring in other Column

I have an example .csv, imported as df.csv, as follows:

    Ethnicity, Description
  0 French, Irish Dance Company
  1 Italian, Moroccan/Algerian
  2 Danish, Company in Netherlands
  3 Dutch, French
  4 English, EnglishFrench
  5 Irish, Irish-American

I'd like to check the pandas test1['Description'] for strings in test1['Ethnicity']. This should return rows 0, 3, 4, and 5 as the description strings contain strings in the ethnicity column.

So far I've tried:

df[df['Ethnicity'].str.contains('French')]['Description']

This returns any specific string, but I'd like to iterate through without searching for each specific ethnicity value. I've also tried converting the columns to lists and iterating through but can't seem to find a way to return the row, as it is no long a DataFrame().

Thank you in advance!

标签： python string pandas dataframe substring

2条回答

小情绪 Triste *

2楼-- · 2019-05-03 12:05

the ever popular double apply:

df[df.Description.apply(lambda x: df.Ethnicity.apply(lambda y: y in x)).any(1)]

  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

Timing

jezrael's answer is far superior

0人赞添加讨论(0) 举报

狗以群分

3楼-- · 2019-05-03 12:09

You can use str.contains with values in column Ethnicity converted tolist and then join by | what is in regex or:

print ('|'.join(df.Ethnicity.tolist()))
French|Italian|Danish|Dutch|English|Irish

mask = df.Description.str.contains('|'.join(df.Ethnicity.tolist()))
print (mask)
0     True
1    False
2    False
3     True
4     True
5     True
Name: Description, dtype: bool

#boolean-indexing
print (df[mask])
  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

It looks like you can omit tolist():

print (df[df.Description.str.contains('|'.join(df.Ethnicity))])
  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

0人赞添加讨论(0) 举报

Search Pandas Column for Substring in other Column

Timing

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间