Regex (grep) for multi-line search needed [duplica

2018-12-31 17:08发布

Possible Duplicate:
How can I search for a multiline pattern in a file ? Use pcregrep

I'm running a grep to find any *.sql file that has the word select followed by the word customerName followed by the word from. This select statement can span many lines and can contain tabs and newlines.

I've tried a few variations on the following:

$ grep -liIr --include="*.sql" --exclude-dir="\.svn*" --regexp="select[a-zA-Z0-
9+\n\r]*customerName[a-zA-Z0-9+\n\r]*from"

This, however, just runs forever. Can anyone help me with the correct syntax please?

3条回答
残风、尘缘若梦
2楼-- · 2018-12-31 17:46

Without the need to install the grep variant pcregrep, you can do multiline search with grep.

$ grep -Pzo "(?s)^(\s*)\N*main.*?{.*?^\1}" *.c

Explanation:

-P activate perl-regexp for grep (a powerful extension of regular extensions)

-z suppress newline at the end of line, subtituting it for null character. That is, grep knows where end of line is, but sees the input as one big line.

-o print only matching. Because we're using -z, the whole file is like a single big line, so if there is a match, the entire file would be printed; this way it won't do that.

In regexp:

(?s) activate PCRE_DOTALL, which means that . finds any character or newline

\N find anything except newline, even with PCRE_DOTALL activated

.*? find . in nongreedy mode, that is, stops as soon as possible.

^ find start of line

\1 backreference to first group (\s*) This is a try to find same indentation of method

As you can imagine, this search prints the main method in a C (*.c) source file.

查看更多
冷夜・残月
3楼-- · 2018-12-31 17:46

Your fundamental problem is that grep works one line at a time - so it cannot find a SELECT statement spread across lines.

Your second problem is that the regex you are using doesn't deal with the complexity of what can appear between SELECT and FROM - in particular, it omits commas, full stops (periods) and blanks, but also quotes and anything that can be inside a quoted string.

I would likely go with a Perl-based solution, having Perl read 'paragraphs' at a time and applying a regex to that. The downside is having to deal with the recursive search - there are modules to do that, of course, including the core module File::Find.

In outline, for a single file:

$/ = "\n\n";    # Paragraphs

while (<>)
{
     if ($_ =~ m/SELECT.*customerName.*FROM/mi)
     {
         printf file name
         go to next file
     }
}

That needs to be wrapped into a sub that is then invoked by the methods of File::Find.

查看更多
旧时光的记忆
4楼-- · 2018-12-31 17:59

I am not very good in grep. But your problem can be solved using AWK command. Just see

awk '/select/,/from/' *.sql

The above code will result from first occurence of select till first sequence of from. Now you need to verify whether returned statements are having customername or not. For this you can pipe the result. And can use awk or grep again.

查看更多
登录 后发表回答