I have a file containing lines as given below. I want to delete a set of rows from the file, if any line from a set of rows contains key word SEDS2-TOP. Each set of rows is separated by a blank line.
0.00 600.00 2214.28 785.71 1.00000 SEDS1-BOTTOM
0.00 600.00 2214.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1500.00 0.00 1.00000 WATER-BOTTOM
0.00 600.00 3446.97 1757.08 1.00000 SEDS2-TOP
0.00 600.00 2218.64 790.51 1.00000 SEDS1-BOTTOM
0.00 600.00 2218.64 790.51 1.00000 SEDS1-TOP
0.00 0.00 600.00 1500.00 1.00000 WATER-BOTTOM
0.00 400.00 2004.28 785.71 1.00000 SEDS1-BOTTOM
0.00 300.00 2254.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1600.00 0.00 1.00000 WATER-BOTTOM
0.00 600.00 3446.97 1757.08 1.00000 SEDS2-TOP
0.00 600.00 1500.00 0.00 1.00000 WATER-BOTTOM
0.00 600.00 3446.97 1757.08 1.00000 SEDS2-TOP
Example the output file should have
0.00 600.00 2214.28 785.71 1.00000 SEDS1-BOTTOM
0.00 600.00 2214.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1500.00 0.00 1.00000 WATER-BOTTOM
0.00 400.00 2004.28 785.71 1.00000 SEDS1-BOTTOM
0.00 300.00 2254.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1600.00 0.00 1.00000 WATER-BOTTOM
You can do it in awk
using 3-rules and the END
rule. It can be written as follows:
awk 'NF==0 { # empty line
for (i in a) # for each line in array a
print i # output line (index)
if (i in a) # if lines exists
print "" # output blank line at end
delete a # clear a array
del=0 # set delete group flag 0
next # get next record
}
/SEDS2-TOP/ { # SEDS2-TOP matched in record
del=1 # set delete group flag 1
delete a # delete array a
next # get next records
}
del==0 { # del group flag is zero
a[$0]++ # add line as index to array a
}
END { # END rule - process last group of lines
if (del==0) { # if del group flag not set
for (i in a) # loop over lines in a
print i # output line (index)
print "" # with newline after
}
}' rowsets
Example Use/Output
Using your data file as input, you can simply select-copy the script above (and change the filename containing the row-sets from rowsets
to whatever you have, then middle-mouse paste into your terminal in the directory with the file, e.g.
$ awk 'NF==0 { # empty line
> for (i in a) # for each line in array a
> print i # output line (index)
> if (i in a) # if lines exists
> print "" # output blank line at end
> delete a # clear a array
> del=0 # set delete group flag 0
> next # get next record
> }
> /SEDS2-TOP/ { # SEDS2-TOP matched in record
> del=1 # set delete group flag 1
> delete a # delete array a
> next # get next records
> }
> del==0 { # del group flag is zero
> a[$0]++ # add line as index to array a
> }
> END { # END rule - process last group of lines
> if (del==0) { # if del group flag not set
> for (i in a) # loop over lines in a
> print i # output line (index)
> print "" # with newline after
> }
> }' rowsets
0.00 600.00 1500.00 0.00 1.00000 WATER-BOTTOM
0.00 600.00 2214.28 785.71 1.00000 SEDS1-BOTTOM
0.00 600.00 2214.28 785.71 1.00000 SEDS1-TOP
0.00 400.00 2004.28 785.71 1.00000 SEDS1-BOTTOM
0.00 300.00 2254.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1600.00 0.00 1.00000 WATER-BOTTOM
Preserving Row Order
If preserving the row-order is needed, then instead of using the line as the index, you can introduce a new counter variable to be used as the index that would correspond to the row number in the array. That allows you to output the rows in their original order, e.g.
awk -v ndx=1 '
NF==0 { # empty line
for (i=1; i<ndx; i++) # for each line in array a
print a[i] # output line
if (ndx > 1) # if lines exists
print "" # output blank line at end
delete a # clear a array
del=0 # set delete group flag 0
ndx=1 # reset array index 1
next # get next record
}
/SEDS2-TOP/ { # SEDS2-TOP matched in record
del=1 # set delete group flag 1
delete a # delete array a
ndx=1 # reset array index 1
next # get next records
}
del==0 { # del group flag is zero
a[ndx++]=$0 # add line to array a
}
END { # END rule - process last group of lines
if (del==0) { # if del group flag not set
for (i=1; i<ndx; i++) # loop over lines in a
print i # output line (index)
print "" # with newline after
}
}' rowsets
In that case, your output would be:
0.00 600.00 2214.28 785.71 1.00000 SEDS1-BOTTOM
0.00 600.00 2214.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1500.00 0.00 1.00000 WATER-BOTTOM
0.00 400.00 2004.28 785.71 1.00000 SEDS1-BOTTOM
0.00 300.00 2254.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1600.00 0.00 1.00000 WATER-BOTTOM
Look things over and let me know if you have further questions.
separated by a white line should lead you to paragraph mode.
Perl:
$ perl -00 -ne 'print if !/SEDS2-TOP/' sample.txt
0.00 600.00 2214.28 785.71 1.00000 SEDS1-BOTTOM
0.00 600.00 2214.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1500.00 0.00 1.00000 WATER-BOTTOM
0.00 400.00 2004.28 785.71 1.00000 SEDS1-BOTTOM
0.00 300.00 2254.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1600.00 0.00 1.00000 WATER-BOTTOM
-00
enable paragraph mode
-n
don't print by default
print if !/SEDS2-TOP/
- print paragraph only if it doesn't match
AWK variant:
$ awk -v RS= -v ORS='\n\n' '!/SEDS2-TOP/' sample.txt
0.00 600.00 2214.28 785.71 1.00000 SEDS1-BOTTOM
0.00 600.00 2214.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1500.00 0.00 1.00000 WATER-BOTTOM
0.00 400.00 2004.28 785.71 1.00000 SEDS1-BOTTOM
0.00 300.00 2254.28 785.71 1.00000 SEDS1-TOP
0.00 600.00 1600.00 0.00 1.00000 WATER-BOTTOM
-v RS=
- enable paragraph mode
-v ORS='\n\n'
- separate output with one new line
!/SEDS2-TOP/
- print only if the paragraph doesn't match
A cumbersome approach to "move" the matching records into a new file would be:
perl -00 -i -ne 'if (!/SEDS2-TOP/) { print } else {print STDERR}' sample.txt 2>sample2.txt
-i
modifies sample.txt in place
print STDERR
- will print non matching lines into on STDERR
2>sample2.txt
- saves the STDERR into the new file.
However, that requires in-place editing and not many textutils have that. Easiest approach is to create two new files, ones with the mathing records and one with non matching ones.
awk -v RS= -v ORS='\n\n' '!/SEDS2-TOP/' sample.txt >not_maching.txt
awk -v RS= -v ORS='\n\n' '/SEDS2-TOP/' sample.txt >matching.txt