Lazy (ungreedy) matching multiple groups using reg

2020-08-23 01:24发布

问题:

I would like to grab the contents of any value between pairs of <tag></tag> tags.

<tag>
This is one block of text
</tag>

<tag>
This is another one
</tag>

The regex I have come up with is

/<tag>(.*)</tag>/m

Though, it appears to be greedy and is capturing everything within the enclosed parentheses up until the very last </tag>. I would like it to be as lazy as possible so that everytime it sees a closing tag, it will treat that as a match group and start over.

How can I write the regex so that I will be able to get multiple matches in the given scenario?

I have included a sample of what I am describing in the following link

http://rubular.com/r/JW5M3rnqIE

Note: This is not XML, nor is it really based on any existing standard format. I won't need anything sophisticated like a full-fledged library that comes with a nice parser.

回答1:

Go with regex pattern:

/<tag>(.*?)<\/tag>/im

Lazy (non-greedy) is .*?, not .*.

To find multiple occurrences, use:

string.scan(/<tag>(.*?)<\/tag>/im)