Javascript regex multiline flag doesn't work

2018-12-31 15:05发布

I wrote a regex to fetch string from html, but it seems the multiline flag doesn't work.

this is my pattern and I want to get the text in h1 tag.

var pattern= /<div class="box-content-5">.*<h1>([^<]+?)<\/h1>/mi
m = html.search(pattern);
return m[1];

I created a string to test it. When the string contains "\n" the result is always null. If I remove all the "\n" , it gave me the right result, no matter with or without /m flag.

what's wrong with my regex?

4条回答
倾城一夜雪
2楼-- · 2018-12-31 15:20

[\s\S] did not work for me in nodejs 6.11.3. Based on the RegExp documentation, it says to use [^] which does work for me.

(The dot, the decimal point) matches any single character except line terminators: \n, \r, \u2028 or \u2029.

Inside a character set, the dot loses its special meaning and matches a literal dot.

Note that the m multiline flag doesn't change the dot behavior. So to match a pattern across multiple lines, the character set [^] can be used (if you don't mean an old version of IE, of course), it will match any character including newlines.

For example:

/This is on line 1[^]*?This is on line 3/m

where the *? is the non-greedy grab of 0 or more occurrences of [^].

查看更多
余欢
3楼-- · 2018-12-31 15:36

The dotall modifier has actually made it into JavaScript in June 2018, that is ECMAScript 2018.
https://github.com/tc39/proposal-regexp-dotall-flag

const re = /foo.bar/s; // Or, `const re = new RegExp('foo.bar', 's');`.
re.test('foo\nbar');
// → true
re.dotAll
// → true
re.flags
// → 's'
查看更多
与风俱净
4楼-- · 2018-12-31 15:38

You want the s (dotall) modifier, which apparently doesn't exist in Javascript - you can replace . with [\s\S] as suggested by @molf. The m (multiline) modifier makes ^ and $ match lines rather than the whole string.

查看更多
ら面具成の殇う
5楼-- · 2018-12-31 15:41

You are looking for the /.../s modifier, also known as the dotall modifier. It forces the dot . to also match newlines, which it does not do by default.

The bad news is that it does not exist in JavaScript (it does as of ES2018, see below). The good news is that you can work around it by using a character class (e.g. \s) and its negation (\S) together, like this:

[\s\S]

So in your case the regex would become:

/<div class="box-content-5">[\s\S]*<h1>([^<]+?)<\/h1>/i

As of ES2018, JavaScript supports the s (dotAll) flag, so in a modern environment your regular expression could be as you wrote it, but with an s flag at the end (rather than m; m changes how ^ and $ work, not .):

/<div class="box-content-5">.*<h1>([^<]+?)<\/h1>/is
查看更多
登录 后发表回答