What does the plus symbol in regex mean?
相关问题
- Improve converting string to readable urls
- Regex to match charset
- Regex subsequence matching
- Accommodate two types of quotes in a regex
- Set together letters and numbers that are ordinal
相关文章
- Optimization techniques for backtracking regex imp
- Regex to check for new line
- Allow only 2 decimal points entry to a textbox usi
- Comparing speed of non-matching regexp
- Regular expression to get URL in string swift with
- 请问如何删除之前和之后的非字母中文单字
- Lazy (ungreedy) matching multiple groups using reg
- when [:punct:] is too much [duplicate]
In most implementations
+
means "one or more".In some theoretical writings
+
is used to mean "or" (most implementations use the|
symbol for that).One or more occurences of the preceding symbols.
E.g.
a+
means the lettera
one or more times. Thus,a
matchesa
,aa
,aaaaaa
but not an empty string.If you know what the asterisk (
*
) means, then you can express(exp)+
as(exp)(exp)*
, where(exp)
is any regular expression.+
can actually have two meanings, depending on context.Like the other answers mentioned,
+
usually is a repetition operator, and causes the preceding token to repeat one or more times.a+
would be expressed asaa*
in formal language theory, and could also be expressed asa{1,}
(match a minimum of 1 times and a maximum of infinite times).However,
+
can also make other quantifiers possessive if it follows a repetition operator (ie?+
,*+
,++
or{m,n}+
). A possessive quantifier is an advanced feature of some regex flavours (PCRE, Java and the JGsoft engine) which tells the engine not to backtrack once a match has been made.To understand how this works, we need to understand two concepts of regex engines: greediness and backtracking. Greediness means that in general regexes will try to consume as many characters as they can. Let's say our pattern is
.*
(the dot is a special construct in regexes which means any character1; the star means match zero or more times), and your target isaaaaaaaab
. The entire string will be consumed, because the entire string is the longest match that satisfies the pattern.However, let's say we change the pattern to
.*b
. Now, when the regex engine tries to match againstaaaaaaaab
, the.*
will again consume the entire string. However, since the engine will have reached the end of the string and the pattern is not yet satisfied (the.*
consumed everything but the pattern still has to matchb
afterwards), it will backtrack, one character at a time, and try to matchb
. The first backtrack will make the.*
consumeaaaaaaaa
, and thenb
can consumeb
, and the pattern succeeds.Possessive quantifiers are also greedy, but as mentioned, once they return a match, the engine can no longer backtrack past that point. So if we change our pattern to
.*+b
(match any character zero or more times, possessively, followed by ab
), and try to matchaaaaaaaab
, again the.*
will consume the whole string, but then since it is possessive, backtracking information is discarded, and the b cannot be matched so the pattern fails.1 In most engines, the dot will not match a newline character, unless the
/s
("singleline" or "dotall") modifier is specified.1 or more of previous expression.
[0-9]+
Would match:
In: