Regular Expressions and negating a whole character

This question already has an answer here:

Regular expression to match a line that doesn't contain a word? 27 answers

I'm attempting something which I feel should be fairly obvious to me but it's not. I'm trying to match a string which does NOT contain a specific sequence of characters. I've tried using [^ab], [^(ab)], etc. to match strings containing no 'a's or 'b's, or only 'a's or only 'b's or 'ba' but not match on 'ab'. The examples I gave won't match 'ab' it's true but they also won't match 'a' alone and I need them to. Is there some simple way to do this?

标签： regex

9条回答

情到深处是孤独

2楼-- · 2019-01-01 14:41

Simplest way is to pull the negation out of the regular expression entirely:

if (!userName.matches("^([Ss]ys)?admin$")) { ... }

0人赞添加讨论(0) 举报

裙下三千臣

3楼-- · 2019-01-01 14:42

Using a character class such as [^ab] will match a single character that is not within the set of characters. (With the ^ being the negating part).

To match a string which does not contain the multi-character sequence ab, you want to use a negative lookahead:

^(?:(?!ab).)+$

And the above expression disected in regex comment mode is:

(?x)    # enable regex comment mode
^       # match start of line/string
(?:     # begin non-capturing group
  (?!   # begin negative lookahead
    ab  # literal text sequence ab
  )     # end negative lookahead
  .     # any single character
)       # end non-capturing group
+       # repeat previous match one or more times
$       # match end of line/string

0人赞添加讨论(0) 举报

皆成旧梦

4楼-- · 2019-01-01 14:47

Using a regex as you described is the simple way (as far as I am aware). If you want a range you could use [^a-f].

0人赞添加讨论(0) 举报

春风洒进眼中

5楼-- · 2019-01-01 14:49

Use negative lookahead:

^(?!.*ab).*$

UPDATE: In the comments below, I stated that this approach is slower than the one given in Peter's answer. I've run some tests since then, and found that it's really slightly faster. However, the reason to prefer this technique over the other is not speed, but simplicity.

The other technique, described here as a tempered greedy token, is suitable for more complex problems, like matching delimited text where the delimiters consist of multiple characters (like HTML, as Luke commented below). For the problem described in the question, it's overkill.

For anyone who's interested, I tested with a large chunk of Lorem Ipsum text, counting the number of lines that don't contain the word "quo". These are the regexes I used:

(?m)^(?!.*\bquo\b).+$

(?m)^(?:(?!\bquo\b).)+$

Whether I search for matches in the whole text, or break it up into lines and match them individually, the anchored lookahead consistently outperforms the floating one.

0人赞添加讨论(0) 举报

裙下三千臣

6楼-- · 2019-01-01 14:50

Just search for "ab" in the string then negate the result:

!/ab/.test("bamboo"); // true
!/ab/.test("baobab"); // false

It seems easier and should be faster too.

0人赞添加讨论(0) 举报

只若初见

7楼-- · 2019-01-01 14:54

Yes its called negative lookahead. It goes like this - (?!regex here). So abc(?!def) will match abc not followed by def. So it'll match abce, abc, abck, etc.

Similarly there is positive lookahead - (?=regex here). So abc(?=def) will match abc followed by def.

There are also negative and positive lookbehind - (?<!regex here) and (?<=regex here) respectively

One point to note is that the negative lookahead is zero-width. That is, it does not count as having taken any space.

So it may look like a(?=b)c will match "abc" but it won't. It will match 'a', then the positive lookahead with 'b' but it won't move forward into the string. Then it will try to match the 'c' with 'b' which won't work. Similarly ^a(?=b)b$ will match 'ab' and not 'abb' because the lookarounds are zero-width (in most regex implementations).

More information on this page

0人赞添加讨论(0) 举报

1 2 下一页

Regular Expressions and negating a whole character

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间