Find regex, move the next line at the end of this

2019-07-29 18:25发布

问题:

I have such text:

37    7    --------------  No  aaa
40    0    --------------  No  bbb
xxx   zzy
aa    bb   cc
42    2    --------------  No  ccc
xxx   zyz
a     b    c               d
43    3    --------------  No  ddd
xy    zz
a     a
a     a
c
52    5    --------------  No  eee
yyyx  zzz

When I process it with awk I get:

awk '{if($1+0==$1) p=$1 FS $2 FS $3 FS $4 FS $5; else $0=p FS $0}1' /tmp/test3 | column -t
37  7  --------------  No  aaa
37  7  --------------  No  aaa  xxx   zzz
40  0  --------------  No  bbb
40  0  --------------  No  bbb  xxx   zzy
40  0  --------------  No  bbb  aa    bb   cc
42  2  --------------  No  ccc
42  2  --------------  No  ccc  xxx   zyz
42  2  --------------  No  ccc  a     b    c   d
43  3  --------------  No  ddd
43  3  --------------  No  ddd  xy    zz
43  3  --------------  No  ddd  a     a
43  3  --------------  No  ddd  a     a
43  3  --------------  No  ddd  c
52  5  --------------  No  eee
52  5  --------------  No  eee  yyyx  zzz

and I need to get following output:

37    7    --------------  No  aaa
40    0    --------------  No  bbb xxx   zzy
40    0    --------------  No  bbb aa    bb   cc
42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c  d
43    3    --------------  No  ddd xy    zz
43    3    --------------  No  ddd a     a
43    3    --------------  No  ddd a     a
43    3    --------------  No  ddd c
52    5    --------------  No  eee yyyx  zzz

Thanks in advance for your help! I've also tried awk '/-/{base=$0; next} {print base, $0}' /tmp/test4 | column -t as suggested but it deletes the first line starting with a number if there's consecutive line starting with a number.

UPDATE

This sed spell solved my problem: sed -r ':a;N;/^[0-9].\n[0-9]/{P;D};:b;s/^(.)\n(.)/\1 \2\n\1/;P;s/.\n//;$d;N;/\n[0-9]/D;bb' /tmp/test2

One more question: if I have more than 8 columns in the output line is there a way to modify the sed command so it moves 9th, 10th and 11th column to a new line and copy the first 5 columns before it?

Let's say I have these 3 lines:

42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c    d    e    f
43    3    --------------  No  ddd xy    zz

and I'd like to get:

42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c
42    2    --------------  No  ccc d     e    f
43    3    --------------  No  ddd xy    zz

回答1:

The Perl script below assumes the following requirements.

The input contains alternating blocks of lines starting with either a number or non-number, where each block of number-lines is followed by a block of text-lines. Updated: For the output the first five columns from the last number-line from its block need be prepended to each of the text-lines from the immediately following text-block. Other text-lines are printed as they are.

The code collects number and text lines in their buffers. They are processed and emptied once we get to the first line of the next number-lines block, which is when both buffers are non-empty.

use warnings;
use strict;
use feature 'say';

my $file = shift @ARGV || 'default_filename.txt';
die "Usage: $0 file\n" if not $file;

open my $fh, '<', $file or die "Can't open $file: $!";

my (@text, @nums);

while (my $line = <$fh>) {
    chomp $line;
    if ($line =~ /^[^0-9]/) { 
        push @text, $line;
        if (eof) {
            process_buffers(\@nums, \@text);
            last
        }
        next;
    }
    elsif (@nums and @text) {
        process_buffers(\@nums, \@text);
    }

    push @nums, $line;
}

sub process_buffers {
    my ($rnums, $rtext) = @_;

    # Remove last number line from array and take its first five columns
    my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
    # Print other number lines; all consecutive spaces replaced by tabs
    say for map { s/\s+/\t/gr } @$rnums;

    # Print text lines prepended by five columns of last number line
    foreach my $text_line (@$rtext) {
        say join "\t", @last_num_line_cols, $text_line;
    }   

    @$rtext = ();
    @$rnums = ();
}

The condition involving eof above is needed to process the last batch of number and text blocks, since no other test can work on the last line. Its placement assumes that the last line must be a text-line, what follows from my assumption of requirements.

This prints

37      7       --------------  No      aaa
40      0       --------------  No      bbb     xxx   zzy
40      0       --------------  No      bbb     aa    bb   cc
42      2       --------------  No      ccc     xxx   zyz
42      2       --------------  No      ccc     a     b    c               d
43      3       --------------  No      ddd     xy    zz
43      3       --------------  No      ddd     a     a
43      3       --------------  No      ddd     a     a
43      3       --------------  No      ddd     c
52      5       --------------  No      eee     yyyx  zzz

(aligned on tabs, as expected in input and wanted in output)


Update   Limit output width to 8 columns, as described in the question update

Use this modified version of the processing function

sub process_buffers_fmt {
    my ($rnums, $rtext) = @_;

    my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
    say for map { s/\s+/\t/gr } @$rnums;

    # Format output lines to 8 columns at most
    foreach my $text_line (@$rtext) {
        my @text_cols = split ' ', $text_line;
        while (my @prn_text_cols = splice @text_cols, 0, 3) {
            say join "\t", @last_num_line_cols, @prn_text_cols;
        }    
    }
    @$rtext = ();
    @$rnums = ();
}

This uses splice to remove the first three columns of text output at a time and print them with the (five) columns of the last number line. This is done in a while loop so it stops once @text_cols is all processed (printed).

To test I add to the text block after the 43 3 ... number line in the input file the following

a b c d e f g h i j k

and the output of the main program acquires these extra lines

43      3       --------------  No      ddd     a       b       c
43      3       --------------  No      ddd     d       e       f
43      3       --------------  No      ddd     g       h       i
43      3       --------------  No      ddd     j       k

The input file that I use to test all requirements and updates is

37    7    --------------  No  aaa MORE COLUMNS
40    0    --------------  No  bbb
xxx   zzy
aa    bb   cc
42    2    --------------  No  ccc 
xxx   zyz
a     b    c               d
43    3    --------------  No  ddd  AND YET MORE
xy    zz
a     a 
a     a 
c
a b c d e f g h i j k
52    5    --------------  No  eee
yyyx  zzz

and the output of the program (with process_buffers_fmt function) is

37      7       --------------  No      aaa     MORE    COLUMNS
40      0       --------------  No      bbb     xxx     zzy
40      0       --------------  No      bbb     aa      bb      cc
42      2       --------------  No      ccc     xxx     zyz
42      2       --------------  No      ccc     a       b       c
42      2       --------------  No      ccc     d
43      3       --------------  No      ddd     xy      zz
43      3       --------------  No      ddd     a       a
43      3       --------------  No      ddd     a       a
43      3       --------------  No      ddd     c
43      3       --------------  No      ddd     a       b       c
43      3       --------------  No      ddd     d       e       f
43      3       --------------  No      ddd     g       h       i
43      3       --------------  No      ddd     j       k
52      5       --------------  No      eee     yyyx    zzz


回答2:

You can use this command as mentioned below, hope it will help

awk '{if($1+0==$1) p=$1 FS $2 FS $3 FS $4 FS $5; else $0=p FS $0}1' test.txt | sort -k2 | column -t | awk '{ if ($6 >= " ") { print } }'


回答3:

This might work for you (GNU sed):

sed -r ':a;N;s/^(.*)\n\1(.)/\1\2/;ta;P;D' file

Open a window of at least two lines. If the head of the previous line is exactly the same as the current line and the current line is longer, remove the previous line and repeat. Otherwise, print then delete the first line and repeat.

N.B. this is run following the awk script.

To achieve the same solution using the original data, use:

sed -r ':a;N;/^[0-9].*\n[0-9]/{P;D};:b;s/^(.*)\n(.*)/\1 \2\n\1/;P;s/.*\n//;$d;N;/\n[0-9]/D;bb' file