Remove a set of rows from a file separated by a wh

2020-04-01 08:09发布

问题:

I have a file containing lines as given below. I want to delete a set of rows from the file, if any line from a set of rows contains key word SEDS2-TOP. Each set of rows is separated by a blank line.

0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM

0.00  600.00  3446.97  1757.08 1.00000 SEDS2-TOP
0.00  600.00  2218.64   790.51 1.00000 SEDS1-BOTTOM
0.00  600.00  2218.64   790.51 1.00000 SEDS1-TOP
0.00    0.00  600.00  1500.00  1.00000 WATER-BOTTOM

0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM

0.00  600.00  3446.97  1757.08 1.00000 SEDS2-TOP
0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM

0.00  600.00  3446.97  1757.08 1.00000 SEDS2-TOP

Example the output file should have

0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM

0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM

回答1:

You can do it in awk using 3-rules and the END rule. It can be written as follows:

awk 'NF==0 {              # empty line
    for (i in a)          # for each line in array a
        print i           # output line (index)
    if (i in a)           # if lines exists
        print ""          # output blank line at end
    delete a              # clear a array
    del=0                 # set delete group flag 0
    next                  # get next record
}
/SEDS2-TOP/ {             # SEDS2-TOP matched in record
    del=1                 # set delete group flag 1
    delete a              # delete array a
    next                  # get next records
}
del==0 {                  # del group flag is zero
    a[$0]++               # add line as index to array a
}
END {                     # END rule - process last group of lines
    if (del==0) {         # if del group flag not set
        for (i in a)      # loop over lines in a
            print i       # output line (index)
        print ""          # with newline after
    }
}' rowsets

Example Use/Output

Using your data file as input, you can simply select-copy the script above (and change the filename containing the row-sets from rowsets to whatever you have, then middle-mouse paste into your terminal in the directory with the file, e.g.

$ awk 'NF==0 {              # empty line
>     for (i in a)          # for each line in array a
>         print i           # output line (index)
>     if (i in a)           # if lines exists
>         print ""          # output blank line at end
>     delete a              # clear a array
>     del=0                 # set delete group flag 0
>     next                  # get next record
> }
> /SEDS2-TOP/ {             # SEDS2-TOP matched in record
>     del=1                 # set delete group flag 1
>     delete a              # delete array a
>     next                  # get next records
> }
> del==0 {                  # del group flag is zero
>     a[$0]++               # add line as index to array a
> }
> END {                     # END rule - process last group of lines
>     if (del==0) {         # if del group flag not set
>         for (i in a)      # loop over lines in a
>             print i       # output line (index)
>         print ""          # with newline after
>     }
> }' rowsets
0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM
0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP

0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM

Preserving Row Order

If preserving the row-order is needed, then instead of using the line as the index, you can introduce a new counter variable to be used as the index that would correspond to the row number in the array. That allows you to output the rows in their original order, e.g.

awk -v ndx=1 '
NF==0 {                   # empty line
    for (i=1; i<ndx; i++) # for each line in array a
        print a[i]        # output line
    if (ndx > 1)          # if lines exists
        print ""          # output blank line at end
    delete a              # clear a array
    del=0                 # set delete group flag 0
    ndx=1                 # reset array index 1
    next                  # get next record
}
/SEDS2-TOP/ {             # SEDS2-TOP matched in record
    del=1                 # set delete group flag 1
    delete a              # delete array a
    ndx=1                 # reset array index 1
    next                  # get next records
}
del==0 {                  # del group flag is zero
    a[ndx++]=$0           # add line to array a
}
END {                     # END rule - process last group of lines
    if (del==0) {         # if del group flag not set
        for (i=1; i<ndx; i++)   # loop over lines in a
            print i       # output line (index)
        print ""          # with newline after
    }
}' rowsets

In that case, your output would be:

0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM

0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM

Look things over and let me know if you have further questions.



回答2:

separated by a white line should lead you to paragraph mode.

Perl:

$ perl -00 -ne 'print if !/SEDS2-TOP/' sample.txt
0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM

0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM
  • -00 enable paragraph mode
  • -n don't print by default
  • print if !/SEDS2-TOP/ - print paragraph only if it doesn't match

AWK variant:

$ awk -v RS= -v ORS='\n\n' '!/SEDS2-TOP/' sample.txt

0.00  600.00  2214.28   785.71 1.00000 SEDS1-BOTTOM
0.00  600.00  2214.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1500.00     0.00 1.00000 WATER-BOTTOM

0.00  400.00  2004.28   785.71 1.00000 SEDS1-BOTTOM
0.00  300.00  2254.28   785.71 1.00000 SEDS1-TOP
0.00  600.00  1600.00     0.00 1.00000 WATER-BOTTOM
  • -v RS= - enable paragraph mode
  • -v ORS='\n\n'- separate output with one new line
  • !/SEDS2-TOP/ - print only if the paragraph doesn't match

A cumbersome approach to "move" the matching records into a new file would be:

perl -00 -i -ne 'if (!/SEDS2-TOP/) { print } else {print STDERR}' sample.txt 2>sample2.txt
  • -i modifies sample.txt in place
  • print STDERR - will print non matching lines into on STDERR
  • 2>sample2.txt - saves the STDERR into the new file.

However, that requires in-place editing and not many textutils have that. Easiest approach is to create two new files, ones with the mathing records and one with non matching ones.

awk -v RS= -v ORS='\n\n' '!/SEDS2-TOP/' sample.txt >not_maching.txt
awk -v RS= -v ORS='\n\n' '/SEDS2-TOP/' sample.txt  >matching.txt


标签: bash shell