What's the profit of using /.*?/

2019-04-05 19:53发布

In some Rails code (cucumber features' steps definitions, javascripts, rails_admin gem) I found this regular expression parts:

string =~ /some regexp.+rules should match "(.*?)"/i

I do have some knowledge at regular expressions and i know that * and ? symbols are similar but whilst asterisk means zero and more, the question mark means could be present or could be not.

So, using the question mark near the group of symbols makes its presence non-required within the phrase being tested. What's the... well... the trick of using it near the non-required already group (skipping requirement is made using the asterisk afaik)?

4条回答
我想做一个坏孩纸
2楼-- · 2019-04-05 20:15

Right after a quantifier (like *), the ? has a different meaning and makes it "ungreedy". So while the default is that * consumes as much as possible, *? matches as little as possible.

In your specific case, this is relevant for strings like this:

some regexp rules should match "some string" or "another"

Without the question mark the regex matches the full string (because .* can consume " just like anything else) and some string" or "another is captured. With the use of the question mark, the match will stop as soon as possible, (so after ...some string") and will capture only some string.

Further reading.

查看更多
我只想做你的唯一
3楼-- · 2019-04-05 20:15

It makes the search non-greedy. That means, it will settle for the shortest possible match, not the longest.

查看更多
闹够了就滚
4楼-- · 2019-04-05 20:19

Consider this string

"<person>1</person><person>2</person>"

the regex

<person>.*</person> would match <person>1</person><person>2</person>

So, .* is greedy..

the regex

<person>.*?</person> would match <person>1</person> and <person>2</person> in the next match

So, .*? is lazy..

查看更多
何必那么认真
5楼-- · 2019-04-05 20:38

? has dual meaning.

/foo?/

means the last o can be there zero or one times.

/foo*?/ 

means the last o will be there zero or many times, but select the minimum number, i.e., it's non-greedy.

These might help explain:

'foo'[/foo?/]   # => "foo"
'fo'[/foo?/]    # => "fo"
'fo'[/foo*?/]   # => "fo"
'foo'[/foo*?/]  # => "fo"
'fooo'[/foo*?/] # => "fo"

The non-greedy use of ? is unfortunate I think. They reused an operator we expected to have a single meaning "zero or one" and threw it at us in a way that can really be difficult to decipher.

But, the need was genuine: Too many times we'd write a pattern that would go wildly wrong, gobbling everything in sight, because the regex engine was doing what we said with unforeseen character patterns. Regex can be very complex and convoluted, but the "non-greedy" use of ? helps tame that. Sometimes, using it is the sloppy or quick-n-dirty way out but we don't have time to rewrite the pattern to do it correctly. Sometimes it's the magic bullet and was elegant. I think which it is depends on whether you're under a deadline and writing code to get something done, or you're debugging years after the fact and finally found that ? wasn't the optimal fix.

查看更多
登录 后发表回答