Column contains column 4

2020-02-07 06:12发布

I have a dataframe. I would like to test whether, (C), on each row, the number in column (B) is in the string, column (A).

df = pd.DataFrame({'A': ["me 123", "me-123", "1234", "me 12", "123 me", "6 you 123-me"],
                   'B': [123,       123,      123,    123,     6,        123]})

I can almost do that using extract

df['C'] = df.A.str.extract('(\d+)', expand=False).astype(float).eq(df.B,0).astype(int)

              A    B  C
0        me 123  123  1
1        me-123  123  1
2          1234  123  0
3         me 12  123  0
4        123 me    6  0
5  6 you 123-me  123  0

However on the bottom row it is not seeing the number 123 becasue of the number 6. I would like to get

              A    B  C
0        me 123  123  1
1        me-123  123  1
2          1234  123  0
3         me 12  123  0
4        123 me    6  0
5  6 you 123-me  123  1

标签: pandas
3条回答
一夜七次
2楼-- · 2020-02-07 06:49

Using findall

[y in x for x , y in zip(df.A.str.findall('(\d+)'),df.B.astype(str))]
Out[733]: [True, True, False, False, False, True]
查看更多
Rolldiameter
3楼-- · 2020-02-07 06:52

Use Series.str.extractall for get all numeric from column, reshape by Series.unstack, check values and add DataFrame.any for test at least one True per row:

df['C'] = (df.A.str.extractall('(\d+)')[0]
               .unstack()
               .astype(float)
               .eq(df.B,0)
               .any(axis=1)
               .astype(int))
print (df)

              A    B  C
0        me 123  123  1
1        me-123  123  1
2          1234  123  0
3         me 12  123  0
4        123 me    6  0
5  6 you 123-me  123  1
查看更多
仙女界的扛把子
4楼-- · 2020-02-07 06:55

re.split

Use 'One or more not-digits' as a pattern

import re

df.assign(C=[int(str(b) in re.split('\D+', a)) for a, b in zip(df.A, df.B)])

              A    B  C
0        me 123  123  1
1        me-123  123  1
2          1234  123  0
3         me 12  123  0
4        123 me    6  0
5  6 you 123-me  123  1
查看更多
登录 后发表回答