Regular expression to match all characters between

2019-03-09 15:33发布

问题:

I'm using sublime text 2 editor. I would like to use regex to match all character between all h1 tags.

As of now i'm using like this

<h1>.+</h1>

Its working fine if the h1 tag doesn't have breaks.

I mean for

<h1>Hello this is a hedaer</h1>

its working fine.

But its not working if the tag look like this

<h1>
   Hello this is a hedaer
</h1>

Can someone help me with the syntax?

回答1:

By default . matches every character except new line character.

In this case, you will need DOTALL option, which will make . matches any character, including new line character. DOTALL option can be specified inline as (?s). For example:

(?s)<h1>.+</h1>

However, you will see that it will not work, since the default behavior of the quantifier is greedy (in this case its +), which means that it will try to consume as many characters as possible. You will need to make it lazy (consume as few characters as possible) by adding extra ? after the quantifier +?:

(?s)<h1>.+?</h1>

Alternatively, the regex can be <h1>[^<>]*</h1>. In this case, you don't need to specify any option.



回答2:

Since this question is the top Google results search for a regex trying to find all the characters between an h1 tag I thought I would give that answer as well. Since that was what I was looking for.

(?s)(?<=<h1>)(.+?)(?=</h1>)

That regex, if used on a sample text like <h1>A title</h1> <p>Some content</p> <h1>Another title</h1> will only return A title.