How can I match multi-line patterns in the command

I regularly use regex to transform text.

To transform, giant text files from the command line, perl lets me do this:

perl -pe < in.txt > out.txt

But this is inherently on a line-by-line basis. Occasionally, I want to match on multi-line things.

How can I do this in the command-line?

标签： regex perl multiline command-line-tool

2条回答

相关推荐>>

2楼-- · 2020-04-11 11:18

To slurp a file instead of doing line by line processing, use the -0777 switch:

perl -0777 -pe 's/.../.../g' in.txt > out.txt

As documented in perlrun #Command Switches:

The special value -00 will cause Perl to slurp files in paragraph mode. Any value -0400 or above will cause Perl to slurp files whole, but by convention the value -0777 is the one normally used for this purpose.

Obviously, for large files this may not work well, in which case you'll need to code some type of buffer to do this replacement. We can't advise any better though without real information about your intent.

0人赞添加讨论(0) 举报

神经病院院长

3楼-- · 2020-04-11 11:37

Grepping across line boundaries

So you want to grep across lines boundaries...

You quite possibly already have pcregrep installed. As you may know, PCRE stands for Perl-Compatible Regular Expressions, and the library is definitely Perl-style, though not identical to Perl.

To match across multiple lines, you have to turn on the multi-line mode -M, which is not the same as (?m)

Running pcregrep -M "(?s)^b.*\d+" text.txt

On this text file:

a
b
c11

The output will be

b
c11

whereas grep would return empty.

Excerpt from the doc:

-M, --multiline Allow patterns to match more than one line. When this option is given, patterns may usefully contain literal newline char- acters and internal occurrences of ^ and $ characters. The output for a successful match may consist of more than one line, the last of which is the one in which the match ended. If the matched string ends with a newline sequence the output ends at the end of that line.

When this option is set, the PCRE library is called in "mul- tiline" mode. There is a limit to the number of lines that can be matched, imposed by the way that pcregrep buffers the input file as it scans it. However, pcregrep ensures that at least 8K characters or the rest of the document (whichever is the shorter) are available for forward matching, and simi- larly the previous 8K characters (or all the previous charac- ters, if fewer than 8K) are guaranteed to be available for lookbehind assertions. This option does not work when input is read line by line (see --line-buffered.)

0人赞添加讨论(0) 举报

How can I match multi-line patterns in the command

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间