How can I search for a multiline pattern in a file

2019-01-01 01:16发布

问题:

I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:

find . -iname \'*.py\' | xargs grep -e \'YOUR_PATTERN\'

But if I need to find patterns that spans on more than one line, I\'m stuck because vanilla grep can\'t find multiline patterns.

回答1:

So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.

For example, you need to find files where the \'_name\' variable is immediatelly followed by the \'_description\' variable:

find . -iname \'*.py\' | xargs pcregrep -M \'_name.*\\n.*_description\'

Tip: you need to include the line break character in your pattern. Depending on your platform, it could be \'\\n\', \\r\', \'\\r\\n\', ...



回答2:

Why don\'t you go for awk:

awk \'/Start pattern/,/End pattern/\' filename


回答3:

Here is the example using GNU grep:

grep -Pzo \'_name.*\\n.*_description\'

-z/--null-data Treat input and output data as sequences of lines.

See also here



回答4:

grep -P also uses libpcre, but is much more widely installed. To find a complete title section of an html document, even if it spans multiple lines, you can use this:

grep -P \'(?s)<title>.*</title>\' example.html

Since the PCRE project implements to the perl standard, use the perl documentation for reference:

  • http://perldoc.perl.org/perlre.html#Modifiers
  • http://perldoc.perl.org/perlre.html#Extended-Patterns


回答5:

Here is a more useful example:

pcregrep -Mi \"<title>(.*\\n){0,5}</title>\" afile.html

It searches the title tag in a html file even if it spans up to 5 lines.

Here is an example of unlimited lines:

pcregrep -Mi \"(?s)<title>.*</title>\" example.html 


回答6:

With silver searcher:

ag \'abc.*(\\n|.)*efg\'

Speed optimizations of silver searcher could possibly shine here.



回答7:

You can use the grep alternative sift here (disclaimer: I am the author).

It support multiline matching and limiting the search to specific file types out of the box:

sift -m --files \'*.py\' \'YOUR_PATTERN\'

(search all *.py files for the specified multiline regex pattern)

It is available for all major operating systems. Take a look at the samples page to see how it can be used to to extract multiline values from an XML file.



回答8:

This answer might be useful:

Regex (grep) for multi-line search needed

To find recursively you can use flags -R (recursive) and --include (GLOB pattern). See:

Use grep --exclude/--include syntax to not grep through certain files



回答9:

perl -ne \'print if (/begin pattern/../end pattern/)\' filename


回答10:

Using ex/vi editor and globstar option (syntax similar to awk and sed):

ex +\"/string1/,/string3/p\" -R -scq! file.txt

where aaa is your starting point, and bbb is your ending text.

To search recursively, try:

ex +\"/aaa/,/bbb/p\" -scq! **/*.py

Note: To enable ** syntax, run shopt -s globstar (Bash 4 or zsh).



回答11:

@Marcin: awk example non-greedy:

awk \'{if ($0 ~ /Start pattern/) {triggered=1;}if (triggered) {print; if ($0 ~ /End pattern/) { exit;}}}\' filename