可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I need help about regular expression matching with non-greedy option.

The match pattern is:

<img\\s.*>

The text to match is:

<html>
<img src=\"test\">
abc
<img
  src=\"a\" src=\'a\' a=b>
</html>

I test on http://regexpal.com

This expression matches all text from <img to last >. I need it to match with the first encountered > after the initial <img, so here I\'d need to get two matches instead of the one that I get.

I tried all combinations of non-greedy ?, with no success.

回答1:

The non-greedy ? works perfectly fine. It\'s just that you need to select dot matches all option in the regex engines (regexpal, the engine you used, also has this option) you are testing with. This is because, regex engines generally don\'t match line breaks when you use .. You need to tell them explicitly that you want to match line-breaks too with .

For example,

<img\\s.*?>

works fine!

Check the results here.

Also, read about how dot behaves in various regex flavours.

回答2:

The ? operand makes match non-greedy. E.g. .* is greedy while .*? isn\'t. So you can use something like <img.*?> to match the whole tag. Or <img[^>]*>.

But remember that the whole set of HTML can\'t be actually parsed with regular expressions.

回答3:

Check Stack Overflow question What do lazy and greedy mean in the context of regular expressions? as well.

Greedy means match longest possible string.

Lazy means match shortest possible string.

For example, the greedy h.+l matches \'hell\' in \'hello\', but the lazy h.+?l matches \'hel\'.

回答4:

The other answers here presuppose that you have a regex angine which supports non-greedy matching, which is an extension introduced in Perl 5 and widely copied to other modern languages; but it is by no means ubiquitous. Many older languages and editors only support traditional regular expressions, which have no mechanism for controlling greediness of the repetition operator * - it always matches the longest possible string.

The trick then is to limit what it\'s allowed to match in the first place. Instead of .* you seem to be looking for

[^>]*

which still matches as many of something as possible; but the something is not just . \"any character\" but instead \"any character which isn\'t >.

Depending on your application, you may or may not want to enable an option to permit \"any character\" to include newlines.

Even if your regex engine supports non-greedy matching, it\'s better to spell out what you actually mean. If this is what you mean, you should probably say this, instead of rely on non-greedy matching to (hopefully, probably) Do What I Mean.

Of course, this is still not what you want if you need to cope with <img title=\"quoted string with > in it\" src=\"other attributes\"> and perhaps <img title=\"nested tags\"> but at that point, you should finally give up on using regex for this like we all told you in the first place.

How can I write a regex which matches non greedy?

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

How can I write a regex which matches non greedy?

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮