Matching regex of multiple lines in AWK. && operat

2019-07-03 21:55发布

I am not sure if the && operator works in regular expressions. What I am trying to do is match a line such that it starts with a number and has the letter 'a' AND the next line starts with a number and has the letter 'b' AND the next line... letter 'c'. This abc sequence will be used as a unique identifier to start reading the file.

Here is what I am sort of going for in awk.

/(^[0-9]+ .*a)&&\n(^[0-9]+ .*b)&&\n(^[0-9]+ .*c) {
print $0
}

Just one of these regex works like (^[0-9]+ .*a), but I am not sure how to string them together with AND THE NEXT LINE IS THIS.

My file would be like:

JUNK UP HERE NOT STARTING WITH NUMBER
1     a           0.110     0.069          
2     a           0.062     0.088          
3     a           0.062     0.121          
4     b           0.062     0.121          
5     c           0.032     0.100         
6     d           0.032     0.100          
7     e           0.032     0.100   

And what I want is:

3     a           0.062     0.121          
4     b           0.062     0.121          
5     c           0.032     0.100         
6     d           0.032     0.100          
7     e           0.032     0.100 

3条回答
叛逆
2楼-- · 2019-07-03 22:15

No it doesn't work. You could try something like this:

/(^[0-9]+.*a[^\n]*)\n([0-9]+.*b[^\n]*)\n([0-9]+.*c[^\n]*)/

And repeat that for as many letters as you need.

The [^\n]* will match as much non-linebreak characters in a row as possible (so up to the linebreak).

查看更多
三岁会撩人
3楼-- · 2019-07-03 22:18

A friend wrote this awk program for me. It is a state machine. And it works.

#!/usr/bin/awk -f

BEGIN {
    # We start out in the "idle" state.
    state = "idle"
}

/^[0-9]+[[:space:]]+q/ {
    # Everytime we encounter a "# q" we either print it or go to the
    # "q_found" state.
    if (state != "printing") {
        state = "q_found"
        line_q = $0
    }
}

/^[0-9]+[[:space:]]+r/ {
    # If we are in the q_found state and "# r" immediate follows,
    # advance to the r_found state.  Else, return to "idle" and 
    # wait for the "# q" to start us off.
    if (state == "q_found") {
        state = "r_found"
        line_r = $0
    } else if (state != "printing") {
        state = "idle"
    }
}

/^[0-9]+[[:space:]]+l/ {
    # If we are in the r_found state and "# l" immediate follows,
    # advance to the l_found state.  Else, return to "idle" and 
    # wait for the "# q" to start us off.
    if (state == "r_found") {
        state = "l_found"
        line_l = $0
    } else if (state != "printing") {
        state = "idle"
    }
}

/^[0-9]+[[:space:]]+i/ {
    # If we are in the l_found state and "# i" immediate follows,
    # we're ready to start printing.  First, display the lines we
    # squirrelled away then move to the "printing" state.  Else,
    # go to "idle" and wait for the "# q" to start us off.
    if (state == "l_found") {
        state = "printing"
        print line_q
        print line_r
        print line_l
        line = 0
    } else if (state != "printing") {
        state = "idle"
    }
}

/^[0-9]+[[:space:]]+/ {
    # If in state "printing", print 50 lines then stop printing
    if (state == "printing") {
        if (++line < 48) print
    }
}
查看更多
贼婆χ
4楼-- · 2019-07-03 22:25

[Update based on clarification.]

One high order bit is that Awk is a line-oriented language, so you won't actually be able to do a normal pattern match to span lines. The usual way to do something like this is to match each line separately, and have a later clause / statement figure out if all the right pieces have been matched.

What I'm doing here is looking for an a in the second field on one line, a b in the second field on another line, and a c in the second field on a third line. In the first two cases, I stash away the contents of the line as well as what line number it occurred on. When the third line is matched and we haven't yet found the whole sequence, I go back and check to see if the other two lines are present and with acceptable line numbers. If all's good, I print out the buffered previous lines and set a flag indicating that everything else should print.

Here's the script:

$2 == "a" { a = $0; aLine = NR; }
$2 == "b" { b = $0; bLine = NR; }
$2 == "c" && !keepPrinting {
    if ((bLine == (NR - 1)) && (aLine == (NR - 2))) {
        print a;
        print b;
        keepPrinting = 1;
    }
}
keepPrinting { print; }

And here's a file I tested it with:

JUNK UP HERE NOT STARTING WITH NUMBER
1     a           0.110     0.069
2     a           0.062     0.088
3     a           0.062     0.121
4     b           0.062     0.121
5     c           0.032     0.100
6     d           0.032     0.100
7     e           0.032     0.100
8     a           0.099     0.121
9     b           0.098     0.121
10    c           0.097     0.100
11    x           0.000     0.200

Here's what I get when I run it:

$ awk -f blort.awk blort.txt
3     a           0.062     0.121
4     b           0.062     0.121
5     c           0.032     0.100
6     d           0.032     0.100
7     e           0.032     0.100
8     a           0.099     0.121
9     b           0.098     0.121
10    c           0.097     0.100
11    x           0.000     0.200
查看更多
登录 后发表回答