Word boundary with regex - cannot extract all word

2020-02-14 09:53发布

I need extract double Male-Cat:

a = "Male-Cat Male-Cat Male-Cat-Female"
b = re.findall(r'(?:\s|^)Male-Cat(?:\s|$)', a)
print (b)
['Male-Cat ']

c = re.findall(r'\bMale-Cat\b', a)
print (c)
['Male-Cat', 'Male-Cat', 'Male-Cat']

I need extract tree times Male-Cat:

a = "Male-Cat Male-Cat Male-Cat"
b = re.findall(r'(?:\s|^)Male-Cat(?:\s|$)', a)
print (b)
['Male-Cat ', ' Male-Cat']

c = re.findall(r'\bMale-Cat\b', a)
print (c)
['Male-Cat', 'Male-Cat', 'Male-Cat']

Another strings which are parsed correctly by first way:

a = 'Male-Cat Female-Cat Male-Cat-Female Male-Cat'
a = 'Male-Cat-Female'
a = 'Male-Cat'

Something missing? Can you explain what is wrong and what is correct way?

标签： python regex string findall boundary

1条回答

我想做一个坏孩纸

2楼-- · 2020-02-14 10:29

Use lookarounds to extract words inside whitespace boundaries:

r'(?<!\S)Male-Cat(?!\S)'

See the online regex demo

Details

(?<!\S) - a whitespace or start of string must appear immediately to the left of the current location
Male-Cat - the term to search for
(?!\S) - a whitespace or end of string must appear immediately to the right of the current location

Since (?<!\S) and (?!\S) are zero-width assertions, the whitespace won't be consumed, and consecutive matches will get found.

0人赞添加讨论(0) 举报

Word boundary with regex - cannot extract all word

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间