I am trying to write a regular expression but it’s being too greedy. The input string could be in either of the following formats:
STUFF_12_1234 or STUFF_1234
What I want to do is to create a regular expression to grab the characters after the last _
. So in the above examples that would be the numbers “1234”. The number of characters after this last _
varies and they could be a combination of letters and numbers. I have tried the following expression:
_(.*?)\Z
This works for “STUFF_1234” by returning “1234” but when I use it against “STUFF_12_1234” it returns “12_1234”
Anyone advise on how the expression should be changed to fix this?
Two options.
With regex:
You just need not to select the
_
character orIf you know the characters are numeric (
\d
)With substring:
There are at least 3 ways to grab the text appearing after the last underscore
_
:Keep the current regex, but specify
RightToLeft
RegexOptions
. Since the regex is searched from right to left, the lazy quantifier will match as few character as possible up to just right after the last_
in the string.Modify the regex to disallow underscore
_
in the text you want to match:Split the input string by
_
and pick the last item. For thisString.Split
is sufficient, no need forRegex.Split
.Use the regexp
Regular expressions search left-to-right, so greediness controls whether they stop early or late. But it won't change the location of the left end of the match.
Try this:
Exclude the
_
from the list of valid chars: