How do I match any character across multiple lines

2018-12-31 00:14发布

For example, this regex

(.*)<FooBar>

will match:

abcde<FooBar>

But how do I get it to match across multiple lines?

abcde
fghij<FooBar>

20条回答
宁负流年不负卿
2楼-- · 2018-12-31 00:19

In Ruby you can use the 'm' option (multiline):

/YOUR_REGEXP/m

See the Regexp documentation on ruby-doc.org for more information.

查看更多
有味是清欢
3楼-- · 2018-12-31 00:19

In java based regular expression you can use [\s\S]

查看更多
千与千寻千般痛.
4楼-- · 2018-12-31 00:19

In the context of use within languages, regular expressions act on strings, not lines. So you should be able to use the regex normally, assuming that the input string has multiple lines.

In this case, the given regex will match the entire string, since "<FooBar>" is present. Depending on the specifics of the regex implementation, the $1 value (obtained from the "(.*)") will either be "fghij" or "abcde\nfghij". As others have said, some implementations allow you to control whether the "." will match the newline, giving you the choice.

Line-based regular expression use is usually for command line things like egrep.

查看更多
爱死公子算了
5楼-- · 2018-12-31 00:26

For Eclipse worked following expression:

Foo

jadajada Bar"

Regular-Expression:

Foo[\S\s]{1,10}.*Bar*
查看更多
伤终究还是伤i
6楼-- · 2018-12-31 00:27

If you're using Eclipse search, you can enable the "DOTALL" option to make '.' match any character including line delimiters: just add "(?s)" at the beginning of your search string. Example:

(?s).*<FooBar>
查看更多
孤独总比滥情好
7楼-- · 2018-12-31 00:30

The question is, can . pattern match any character? The answer varies from engine to engine. The main difference is whether the pattern is used by a POSIX or non-POSIX regex library.

Special note about : they are not considered regular expressions, but . matches any char there, same as POSIX based engines.

Another note on and : the . matches any char by default (demo): str = "abcde\n fghij<Foobar>"; expression = '(.*)<Foobar>*'; [tokens,matches] = regexp(str,expression,'tokens','match'); (tokens contain a abcde\n fghij item).

Also, in all of 's regex grammars the dot matches line breaks by default. Boost's ECMAScript grammar allows you to turn this off with regex_constants::no_mod_m (source).

As for (it is POSIX based), use n option (demo): select regexp_substr('abcde' || chr(10) ||' fghij<Foobar>', '(.*)<Foobar>', 1, 1, 'n', 1) as results from dual

POSIX-based engines:

A mere . already matches line breaks, no need to use any modifiers, see (demo).

The (demo), (demo), (TRE, base R default engine with no perl=TRUE, for base R with perl=TRUE or for stringr/stringi patterns, use the (?s) inline modifier) (demo) also treat . the same way.

However, most POSIX based tools process input line by line. Hence, . does not match the line breaks just because they are not in scope. Here are some examples how to override this:

  • - There are multiple workarounds, the most precise but not very safe is sed 'H;1h;$!d;x; s/\(.*\)><Foobar>/\1/' (H;1h;$!d;x; slurps the file into memory). If whole lines must be included, sed '/start_pattern/,/end_pattern/d' file (removing from start will end with matched lines included) or sed '/start_pattern/,/end_pattern/{{//!d;};}' file (with matching lines excluded) can be considered.
  • - perl -0pe 's/(.*)<FooBar>/$1/gs' <<< "$str" (-0 slurps the whole file into memory, -p prints the file after applying the script given by -e). Note that using -000pe will slurp the file and activate 'paragraph mode' where Perl uses consecutive newlines (\n\n) as the record separator.
  • - grep -Poz '(?si)abc\K.*?(?=<Foobar>)' file. Here, z enables file slurping, (?s) enables the DOTALL mode for the . pattern, (?i) enables case insensitive mode, \K omits the text matched so far, *? is a lazy quantifier, (?=<Foobar>) matches the location before <Foobar>.
  • - pcregrep -Mi "(?si)abc\K.*?(?=<Foobar>)" file (M enables file slurping here). Note pcregrep is a good solution for Mac OS grep users.

See demos.

Non-POSIX-based engines:

  • - Use s modifier PCRE_DOTALL modifier: preg_match('~(.*)<Foobar>~s', $s, $m) (demo)
  • - Use RegexOptions.Singleline flag (demo):
    - var result = Regex.Match(s, @"(.*)<Foobar>", RegexOptions.Singleline).Groups[1].Value;
    - var result = Regex.Match(s, @"(?s)(.*)<Foobar>").Groups[1].Value;
  • - Use (?s) inline option: $s = "abcde`nfghij<FooBar>"; $s -match "(?s)(.*)<Foobar>"; $matches[1]
  • - Use s modifier (or (?s) inline version at the start) (demo): /(.*)<FooBar>/s
  • - Use re.DOTALL (or re.S) flags or (?s) inline modifier (demo): m = re.search(r"(.*)<FooBar>", s, flags=re.S) (and then if m:, print(m.group(1)))
  • - Use Pattern.DOTALL modifier (or inline (?s) flag) (demo): Pattern.compile("(.*)<FooBar>", Pattern.DOTALL)
  • - Use (?s) in-pattern modifier (demo): regex = /(?s)(.*)<FooBar>/
  • - Use (?s) modifier (demo): "(?s)(.*)<Foobar>".r.findAllIn("abcde\n fghij<Foobar>").matchData foreach { m => println(m.group(1)) }
  • - Use [^] or workarounds [\d\D] / [\w\W] / [\s\S] (demo): s.match(/([\s\S]*)<FooBar>/)[1]
  • (std::regex) Use [\s\S] or the JS workarounds (demo): regex rex(R"(([\s\S]*)<FooBar>)");
  • - Use the same approach as in JavaScript, ([\s\S]*)<Foobar>.
  • - Use /m MULTILINE modifier (demo): s[/(.*)<Foobar>/m, 1]
  • - Use the inline modifier (?s) at the start (demo): re: = regexp.MustCompile(`(?s)(.*)<FooBar>`)
  • - Use dotMatchesLineSeparators or (easier) pass the (?s) inline modifier to the pattern: let rx = "(?s)(.*)<Foobar>"
  • - Same as Swift, (?s) works the easiest, but here is how the option can be used: NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionDotMatchesLineSeparators error:&regexError];
  • , - Use (?s) modifier (demo): "(?s)(.*)<Foobar>" (in Google Spreadsheets, =REGEXEXTRACT(A2,"(?s)(.*)<Foobar>"))

NOTES ON (?s):

In most non-POSIX engines, (?s) inline modifier (or embedded flag option) can be used to enforce . to match line breaks.

If placed at the start of the pattern, (?s) changes the bahavior of all . in the pattern. If the (?s) is placed somewhere after the beginning, only those . will be affected that are located to the right of it unless this is a pattern passed to Python re. In Python re, regardless of the (?s) location, the whole pattern . are affected. The (?s) effect is stopped using (?-s). A modified group can be used to only affect a specified range of a regex pattern (e.g. Delim1(?s:.*?)\nDelim2.* will make the first .*? match across newlines and the second .* will only match the rest of the line).

POSIX note:

In non-regex engines, to match any char, [\s\S] / [\d\D] / [\w\W] constructs can be used.

In POSIX, [\s\S] is not matching any char (as in JavaScript or any non-POSIX engine) because regex escape sequences are not supported inside bracket expressions. [\s\S] is parsed as bracket expressions that match a single char, \ or s or S.

查看更多
登录 后发表回答