I am not sure if the && operator works in regular expressions. What I am trying to do is match a line such that it starts with a number and has the letter 'a' AND the next line starts with a number and has the letter 'b' AND the next line... letter 'c'. This abc sequence will be used as a unique identifier to start reading the file.
Here is what I am sort of going for in awk.
/(^[0-9]+ .*a)&&\n(^[0-9]+ .*b)&&\n(^[0-9]+ .*c) {
print $0
}
Just one of these regex works like (^[0-9]+ .*a), but I am not sure how to string them together with AND THE NEXT LINE IS THIS.
My file would be like:
JUNK UP HERE NOT STARTING WITH NUMBER
1 a 0.110 0.069
2 a 0.062 0.088
3 a 0.062 0.121
4 b 0.062 0.121
5 c 0.032 0.100
6 d 0.032 0.100
7 e 0.032 0.100
And what I want is:
3 a 0.062 0.121
4 b 0.062 0.121
5 c 0.032 0.100
6 d 0.032 0.100
7 e 0.032 0.100
No it doesn't work. You could try something like this:
And repeat that for as many letters as you need.
The
[^\n]*
will match as much non-linebreak characters in a row as possible (so up to the linebreak).A friend wrote this awk program for me. It is a state machine. And it works.
[Update based on clarification.]
One high order bit is that Awk is a line-oriented language, so you won't actually be able to do a normal pattern match to span lines. The usual way to do something like this is to match each line separately, and have a later clause / statement figure out if all the right pieces have been matched.
What I'm doing here is looking for an
a
in the second field on one line, ab
in the second field on another line, and ac
in the second field on a third line. In the first two cases, I stash away the contents of the line as well as what line number it occurred on. When the third line is matched and we haven't yet found the whole sequence, I go back and check to see if the other two lines are present and with acceptable line numbers. If all's good, I print out the buffered previous lines and set a flag indicating that everything else should print.Here's the script:
And here's a file I tested it with:
Here's what I get when I run it: