Print the smallest set of lines between two patter

2019-09-03 16:22发布

Input file

aaa
Any--END--Pattern
bbb
ANY--BEGIN--PATTERN
ccc                   # do not print
ANY--BEGIN--PATTERN   # print 1
ddd                   # print 2
Any--END--Pattern     # print 3
eee
fff
ANY--BEGIN--PATTERN   # print 4
ggg                   # print 5
Any--END--Pattern     # print 6
hhh                   # print 7
Any--END--Pattern     # print 8
iii                   # do not print
ANY--BEGIN--PATTERN
jjj

Wanted output

ANY--BEGIN--PATTERN   # print 1
ddd                   # print 2
Any--END--Pattern     # print 3
ANY--BEGIN--PATTERN   # print 4
ggg                   # print 5
Any--END--Pattern     # print 6
hhh                   # print 7
Any--END--Pattern     # print 8

Notes

  • Print from the latest ANY--BEGIN--PATTERN before the current Any--END--Pattern.
  • Print until the last Any--END--Pattern if no ANY--BEGIN--PATTERN meet.

Many similar questions but cannot find an answer for this issue

The answers I have tested from these questions print the line ccc and/or the line iii... or do not print the lines having the BEGIN and END patterns. My several attempts have these same drawbacks and defects.

We could write a ten lines script, but I am sure there is an elegant one-line command solving this issue but I cannot find it. Therefore I think this could be a good SO question ;-)

I wonder what are the tricks to use from sed, awk, perl or any other tool available easy on our Unix-like systems. Please provide a tiny command line using : , , , , or any other tool you think...


EDIT:

Just to underline the pretty simple command line from Sundeep's comment that simplifies the problem by reversing the input file:

tac input.txt | sed -n '/END/,/BEGIN/p' | tac

But this command line also prints the beginning
(this case may not happen for other users looking a similar issue)

aaa
Any--END--Pattern
ANY--BEGIN--PATTERN   # print 1
ddd                   # print 2
Any--END--Pattern     # print 3
ANY--BEGIN--PATTERN   # print 4
ggg                   # print 5
Any--END--Pattern     # print 6
hhh                   # print 7
Any--END--Pattern     # print 8

(This answer is used within this C++ coding rules)

3条回答
\"骚年 ilove
2楼-- · 2019-09-03 16:44

awk to the rescue!

$ awk '/BEGIN/{c=0; b=1} 
              {a[c++]=$0} 
      b&&/END/{for(i=0;i<c;i++) print a[i]; delete a; c=0}' file

ANY--BEGIN--PATTERN   # print 1
ddd                   # print 2
Any--END--Pattern     # print 3
ANY--BEGIN--PATTERN   # print 4
ggg                   # print 5
Any--END--Pattern     # print 6
hhh                   # print 7
Any--END--Pattern     # print 8
查看更多
叛逆
3楼-- · 2019-09-03 17:02

Perl to the rescue!

#!/usr/bin/perl
use warnings;
use strict;

my $last_end;
my @buffer;
while (<>) {
    if (/BEGIN/) {

        print @buffer[ 0 .. $last_end ] if defined $last_end;

        @buffer = $_;
        undef $last_end;
        next;
    }
    $last_end = @buffer if @buffer && /END/;
    push @buffer, $_ if @buffer;
}

@buffer accumulates the lines from BEGIN, $last_end points to, well, the last END in the buffer, so you can throw away accumulated lines that don't end in an END.

As a one-liner (but why?):

perl -ne 'defined $l && print(@B[0..$l]), (@B, $l) = $_, next if /BEGIN/; $l=@B if @B && /END/; push @B, $_ if @B' file
查看更多
我想做一个坏孩纸
4楼-- · 2019-09-03 17:06

This should work with sed

sed '$b1;/BEGIN/{:1;x;s/\(BEGIN.*END[^\n]*\).*/\1/;t;x;h};H;d' file
查看更多
登录 后发表回答