What is the meaning of + in a regex?

2019-01-04 02:21发布

What does the plus symbol in regex mean?

标签: regex symbol
4条回答
Emotional °昔
2楼-- · 2019-01-04 02:58

In most implementations + means "one or more".

In some theoretical writings + is used to mean "or" (most implementations use the | symbol for that).

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-01-04 03:09

One or more occurences of the preceding symbols.

E.g. a+ means the letter a one or more times. Thus, a matches a, aa, aaaaaa but not an empty string.

If you know what the asterisk (*) means, then you can express (exp)+ as (exp)(exp)*, where (exp) is any regular expression.

查看更多
再贱就再见
4楼-- · 2019-01-04 03:12

+ can actually have two meanings, depending on context.

Like the other answers mentioned, + usually is a repetition operator, and causes the preceding token to repeat one or more times. a+ would be expressed as aa* in formal language theory, and could also be expressed as a{1,} (match a minimum of 1 times and a maximum of infinite times).


However, + can also make other quantifiers possessive if it follows a repetition operator (ie ?+, *+, ++ or {m,n}+). A possessive quantifier is an advanced feature of some regex flavours (PCRE, Java and the JGsoft engine) which tells the engine not to backtrack once a match has been made.

To understand how this works, we need to understand two concepts of regex engines: greediness and backtracking. Greediness means that in general regexes will try to consume as many characters as they can. Let's say our pattern is .* (the dot is a special construct in regexes which means any character1; the star means match zero or more times), and your target is aaaaaaaab. The entire string will be consumed, because the entire string is the longest match that satisfies the pattern.

However, let's say we change the pattern to .*b. Now, when the regex engine tries to match against aaaaaaaab, the .* will again consume the entire string. However, since the engine will have reached the end of the string and the pattern is not yet satisfied (the .* consumed everything but the pattern still has to match b afterwards), it will backtrack, one character at a time, and try to match b. The first backtrack will make the .* consume aaaaaaaa, and then b can consume b, and the pattern succeeds.

Possessive quantifiers are also greedy, but as mentioned, once they return a match, the engine can no longer backtrack past that point. So if we change our pattern to .*+b (match any character zero or more times, possessively, followed by a b), and try to match aaaaaaaab, again the .* will consume the whole string, but then since it is possessive, backtracking information is discarded, and the b cannot be matched so the pattern fails.


1 In most engines, the dot will not match a newline character, unless the /s ("singleline" or "dotall") modifier is specified.

查看更多
smile是对你的礼貌
5楼-- · 2019-01-04 03:22

1 or more of previous expression.

[0-9]+

Would match:

1234567890

In:

I have 1234567890 dollars

查看更多
登录 后发表回答