How to save the lines of grep matches?

2019-09-19 00:37发布

问题:

I have a grep like this:

iarr=`grep -Poh '.*INPUT' file.txt`
oarr=`grep -Poh '.*OUTPUT' file.txt`

So with this content of file.txt:

INPUT  hello.txt
OUTPUT stack.txt
INPUT  stack.txt
OUTPUT hello.txt
INPUT  overflow.txt
OUTPUT overflow.txt
OUTPUT byebye.txt
INPUT  byebye.txt
INPUT  nick.txt
OUTPUT jesus.txt

Te output would be:

iarr

INPUT hello.txt
INPUT stack.txt
INPUT overflow.txt
INPUT byebye.txt
INPUT nick.txt

oarr

OUTPUT stack.txt
OUTPUT hello.txt
OUTPUT overflow.txt
OUTPUT byebye.txt
OUTPUT jesus.txt

Well, but I want to know what filenames I need in input and what will be the output, I mean:

  • If a INPUT file has an OUTPUT (of the same file) in the lines bellow of the actual line, then this item would be deleted from the oarr.
  • If a OUTPUT file has an INPUT (of the same file) in the lines above of the actual line, then this item would be deleted from iarr.

With these conditions, the result would be the following:

iarr

INPUT hello.txt
INPUT overflow.txt
INPUT nick.txt

oarr

OUTPUT stack.txt
OUTPUT byebye.txt
OUTPUT jesus.txt

I'm trying to get this result, but I don't know how to store the line of an especific match of the regexp in the grep command... This is what i thought to solve this, do you have another idea to do it? May be reading it line by line...

回答1:

If I understand your question well, you only need the first occurrence of each filename. You can achieve this using awk:

awk '!x[$2]++' file.txt

will thus give

INPUT  hello.txt
OUTPUT stack.txt
INPUT  overflow.txt
OUTPUT byebye.txt
INPUT  nick.txt
OUTPUT jesus.txt

on which you can continue your processing.



标签: regex bash grep