For example, this regex
(.*)<FooBar>
will match:
abcde<FooBar>
But how do I get it to match across multiple lines?
abcde
fghij<FooBar>
For example, this regex
(.*)<FooBar>
will match:
abcde<FooBar>
But how do I get it to match across multiple lines?
abcde
fghij<FooBar>
In Ruby ruby you can use the '
m
' option (multiline):See the Regexp documentation on ruby-doc.org for more information.
In java based regular expression you can use
[\s\S]
In the context of use within languages, regular expressions act on strings, not lines. So you should be able to use the regex normally, assuming that the input string has multiple lines.
In this case, the given regex will match the entire string, since "<FooBar>" is present. Depending on the specifics of the regex implementation, the $1 value (obtained from the "(.*)") will either be "fghij" or "abcde\nfghij". As others have said, some implementations allow you to control whether the "." will match the newline, giving you the choice.
Line-based regular expression use is usually for command line things like egrep.
For Eclipse worked following expression:
Regular-Expression:
If you're using Eclipse search, you can enable the "DOTALL" option to make '.' match any character including line delimiters: just add "(?s)" at the beginning of your search string. Example:
The question is, can
.
pattern match any character? The answer varies from engine to engine. The main difference is whether the pattern is used by a POSIX or non-POSIX regex library.Special note about lua-patterns: they are not considered regular expressions, but
.
matches any char there, same as POSIX based engines.Another note on matlab and octave: the
.
matches any char by default (demo):str = "abcde\n fghij<Foobar>"; expression = '(.*)<Foobar>*'; [tokens,matches] = regexp(str,expression,'tokens','match');
(tokens
contain aabcde\n fghij
item).Also, in all of boost's regex grammars the dot matches line breaks by default. Boost's ECMAScript grammar allows you to turn this off with
regex_constants::no_mod_m
(source).As for oracle (it is POSIX based), use
n
option (demo):select regexp_substr('abcde' || chr(10) ||' fghij<Foobar>', '(.*)<Foobar>', 1, 1, 'n', 1) as results from dual
POSIX-based engines:
A mere
.
already matches line breaks, no need to use any modifiers, see bash (demo).The tcl (demo), postgresql (demo), r (TRE, base R default engine with no
perl=TRUE
, for base R withperl=TRUE
or for stringr/stringi patterns, use the(?s)
inline modifier) (demo) also treat.
the same way.However, most POSIX based tools process input line by line. Hence,
.
does not match the line breaks just because they are not in scope. Here are some examples how to override this:sed 'H;1h;$!d;x; s/\(.*\)><Foobar>/\1/'
(H;1h;$!d;x;
slurps the file into memory). If whole lines must be included,sed '/start_pattern/,/end_pattern/d' file
(removing from start will end with matched lines included) orsed '/start_pattern/,/end_pattern/{{//!d;};}' file
(with matching lines excluded) can be considered.perl -0pe 's/(.*)<FooBar>/$1/gs' <<< "$str"
(-0
slurps the whole file into memory,-p
prints the file after applying the script given by-e
). Note that using-000pe
will slurp the file and activate 'paragraph mode' where Perl uses consecutive newlines (\n\n
) as the record separator.grep -Poz '(?si)abc\K.*?(?=<Foobar>)' file
. Here,z
enables file slurping,(?s)
enables the DOTALL mode for the.
pattern,(?i)
enables case insensitive mode,\K
omits the text matched so far,*?
is a lazy quantifier,(?=<Foobar>)
matches the location before<Foobar>
.pcregrep -Mi "(?si)abc\K.*?(?=<Foobar>)" file
(M
enables file slurping here). Notepcregrep
is a good solution for Mac OSgrep
users.See demos.
Non-POSIX-based engines:
s
modifier PCRE_DOTALL modifier:preg_match('~(.*)<Foobar>~s', $s, $m)
(demo)RegexOptions.Singleline
flag (demo):-
var result = Regex.Match(s, @"(.*)<Foobar>", RegexOptions.Singleline).Groups[1].Value;
-
var result = Regex.Match(s, @"(?s)(.*)<Foobar>").Groups[1].Value;
(?s)
inline option:$s = "abcde`nfghij<FooBar>"; $s -match "(?s)(.*)<Foobar>"; $matches[1]
s
modifier (or(?s)
inline version at the start) (demo):/(.*)<FooBar>/s
re.DOTALL
(orre.S
) flags or(?s)
inline modifier (demo):m = re.search(r"(.*)<FooBar>", s, flags=re.S)
(and thenif m:
,print(m.group(1))
)Pattern.DOTALL
modifier (or inline(?s)
flag) (demo):Pattern.compile("(.*)<FooBar>", Pattern.DOTALL)
(?s)
in-pattern modifier (demo):regex = /(?s)(.*)<FooBar>/
(?s)
modifier (demo):"(?s)(.*)<Foobar>".r.findAllIn("abcde\n fghij<Foobar>").matchData foreach { m => println(m.group(1)) }
[^]
or workarounds[\d\D]
/[\w\W]
/[\s\S]
(demo):s.match(/([\s\S]*)<FooBar>/)[1]
std::regex
) Use[\s\S]
or the JS workarounds (demo):regex rex(R"(([\s\S]*)<FooBar>)");
([\s\S]*)<Foobar>
./m
MULTILINE modifier (demo):s[/(.*)<Foobar>/m, 1]
(?s)
at the start (demo):re: = regexp.MustCompile(`(?s)(.*)<FooBar>`)
dotMatchesLineSeparators
or (easier) pass the(?s)
inline modifier to the pattern:let rx = "(?s)(.*)<Foobar>"
(?s)
works the easiest, but here is how the option can be used:NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionDotMatchesLineSeparators error:®exError];
(?s)
modifier (demo):"(?s)(.*)<Foobar>"
(in Google Spreadsheets,=REGEXEXTRACT(A2,"(?s)(.*)<Foobar>")
)NOTES ON
(?s)
:In most non-POSIX engines,
(?s)
inline modifier (or embedded flag option) can be used to enforce.
to match line breaks.If placed at the start of the pattern,
(?s)
changes the bahavior of all.
in the pattern. If the(?s)
is placed somewhere after the beginning, only those.
will be affected that are located to the right of it unless this is a pattern passed to Pythonre
. In Pythonre
, regardless of the(?s)
location, the whole pattern.
are affected. The(?s)
effect is stopped using(?-s)
. A modified group can be used to only affect a specified range of a regex pattern (e.g.Delim1(?s:.*?)\nDelim2.*
will make the first.*?
match across newlines and the second.*
will only match the rest of the line).POSIX note:
In non-regex engines, to match any char,
[\s\S]
/[\d\D]
/[\w\W]
constructs can be used.In POSIX,
[\s\S]
is not matching any char (as in JavaScript or any non-POSIX engine) because regex escape sequences are not supported inside bracket expressions.[\s\S]
is parsed as bracket expressions that match a single char,\
ors
orS
.