Print the smallest set of lines between two patter

2019-09-03 16:43发布

问题:

Input file

aaa
Any--END--Pattern
bbb
ANY--BEGIN--PATTERN
ccc                   # do not print
ANY--BEGIN--PATTERN   # print 1
ddd                   # print 2
Any--END--Pattern     # print 3
eee
fff
ANY--BEGIN--PATTERN   # print 4
ggg                   # print 5
Any--END--Pattern     # print 6
hhh                   # print 7
Any--END--Pattern     # print 8
iii                   # do not print
ANY--BEGIN--PATTERN
jjj

Wanted output

ANY--BEGIN--PATTERN   # print 1
ddd                   # print 2
Any--END--Pattern     # print 3
ANY--BEGIN--PATTERN   # print 4
ggg                   # print 5
Any--END--Pattern     # print 6
hhh                   # print 7
Any--END--Pattern     # print 8

Notes

  • Print from the latest ANY--BEGIN--PATTERN before the current Any--END--Pattern.
  • Print until the last Any--END--Pattern if no ANY--BEGIN--PATTERN meet.

Many similar questions but cannot find an answer for this issue

  • One-liner to print all lines between two patterns [perl]
  • How to select lines between two patterns? [awk/sed/grep]
  • awk print only lines between two patterns removing first match
  • How to select lines between two marker patterns which may occur multiple times with awk/sed
  • Extract lines between two patterns with awk and a variable regex
  • How to select lines between two marker patterns which may occur multiple times with awk/sed
  • ...

The answers I have tested from these questions print the line ccc and/or the line iii... or do not print the lines having the BEGIN and END patterns. My several attempts have these same drawbacks and defects.

We could write a ten lines script, but I am sure there is an elegant one-line command solving this issue but I cannot find it. Therefore I think this could be a good SO question ;-)

I wonder what are the tricks to use from sed, awk, perl or any other tool available easy on our Unix-like systems. Please provide a tiny command line using : bash, grep, sed, awk, perl or any other tool you think...


EDIT:

Just to underline the pretty simple command line from Sundeep's comment that simplifies the problem by reversing the input file:

tac input.txt | sed -n '/END/,/BEGIN/p' | tac

But this command line also prints the beginning
(this case may not happen for other users looking a similar issue)

aaa
Any--END--Pattern
ANY--BEGIN--PATTERN   # print 1
ddd                   # print 2
Any--END--Pattern     # print 3
ANY--BEGIN--PATTERN   # print 4
ggg                   # print 5
Any--END--Pattern     # print 6
hhh                   # print 7
Any--END--Pattern     # print 8

(This answer is used within this C++ coding rules)

回答1:

awk to the rescue!

$ awk '/BEGIN/{c=0; b=1} 
              {a[c++]=$0} 
      b&&/END/{for(i=0;i<c;i++) print a[i]; delete a; c=0}' file

ANY--BEGIN--PATTERN   # print 1
ddd                   # print 2
Any--END--Pattern     # print 3
ANY--BEGIN--PATTERN   # print 4
ggg                   # print 5
Any--END--Pattern     # print 6
hhh                   # print 7
Any--END--Pattern     # print 8


回答2:

Perl to the rescue!

#!/usr/bin/perl
use warnings;
use strict;

my $last_end;
my @buffer;
while (<>) {
    if (/BEGIN/) {

        print @buffer[ 0 .. $last_end ] if defined $last_end;

        @buffer = $_;
        undef $last_end;
        next;
    }
    $last_end = @buffer if @buffer && /END/;
    push @buffer, $_ if @buffer;
}

@buffer accumulates the lines from BEGIN, $last_end points to, well, the last END in the buffer, so you can throw away accumulated lines that don't end in an END.

As a one-liner (but why?):

perl -ne 'defined $l && print(@B[0..$l]), (@B, $l) = $_, next if /BEGIN/; $l=@B if @B && /END/; push @B, $_ if @B' file


回答3:

This should work with sed

sed '$b1;/BEGIN/{:1;x;s/\(BEGIN.*END[^\n]*\).*/\1/;t;x;h};H;d' file