How to exclude specific string using regex in Pyth

2019-07-23 07:33发布

I'd like to match strings like:

45 meters?
45, meters?
45?
45 ?

but not strings like:

45 meters you?
45 you  ?
45, and you?

In both cases the question mark must be at the end. So, essentially I want to exclude all those strings containing the word "you".

I've tried the following regex:

'\d+.*(?!you)\?$'

but it matches the second case (probably because of .*)

2条回答
\"骚年 ilove
2楼-- · 2019-07-23 07:56

You could try this regex to match all the lines which doesn't have the string you with ? at the last,

^(?!.*you).*\?$

Explanation:

A negative lookahead is used in this regex. What it does actually means, it checks for the lines which contains a string you. It matches all the lines except the line containing the string you.

DEMO

查看更多
家丑人穷心不美
3楼-- · 2019-07-23 07:57

There's a neat trick to exclude some matches from a regex, which you can use here:

>>> import re
>>> corpus = """
... 45 meters?
... 45?
... 45 ?
... 45 meters you?
... 45 you  ?
... 45, and you?
... """
>>> pattern = re.compile(r"\d+[^?]*you|(\d+[^?]*\?)")
>>> re.findall(pattern, corpus)
['45 meters?', '45?', '45 ?', '', '', '']

The downside is that you get empty matches when the exclusion kicks in, but those are easily filtered out:

>>> filter(None, re.findall(pattern, corpus))
['45 meters?', '45?', '45 ?']

How it works:

The trick is that we only pay attention to captured groups ... so the left hand side of the alternation - \d+[^?]*you (or "digits followed by non-?-characters followed by 'you'") matches what you don't want, and then we forget about it. Only if the left hand side doesn't match is the right hand side - (\d+[^?]*\?) (or "digits followed by non-?-characters followed by '?') - matched, and that one is captured.

查看更多
登录 后发表回答