my input file has blocks like
[abc]
para1=123
para2=456
para3=111
[pqr]
para1=333
para2=765
para3=1345
[xyz]
para1=888
para2=236
para3=964
[pqr]
para1=tyu
para2=ghj
para3=ghjk
[xyz]
para1=qwe
para2=asd
para3=zxc
Now I need to delete the block which is duplicate using sed or awk. Have to delete the block which we get first from the top of the file. Ex: in above case, we have get the output like
[abc]
para1=123
para2=456
para3=111
[pqr]
para1=tyu
para2=ghj
para3=ghjk
[xyz]
para1=qwe
para2=asd
para3=zxc
I do get this from using awk
(not sure if you did forget the abc
block)
awk '!a[$1]++' RS= ORS="\n\n" file
[abc]
para1=123
para2=456
para3=111
[pqr]
para1=333
para2=765
para3=1345
[xyz]
para1=888
para2=236
para3=964
$ cat tst.awk
BEGIN{ RS=""; ORS="\n\n" }
!seen[$1]++ { keys[++numKeys] = $1 }
{ rec[$1] = $0 }
END {
for (k=1; k<=numKeys; k++) {
print rec[keys[k]]
}
}
.
$ awk -f tst.awk file
[abc]
para1=123
para2=456
para3=111
[pqr]
para1=tyu
para2=ghj
para3=ghjk
[xyz]
para1=qwe
para2=asd
para3=zxc
This keeps the last instance of each block not the first
tac file | awk -F"\n" '!x[$NF]++' RS= ORS="\n\n" | tac
Slight problem with this method is that as the field separator is a newline the lines have to have the same amount of whitespace after the text as it is counted as the field.
Otherwise should work perfectly :)
tac file | awk '!x[$(NF-1)]++' RS= ORS="\n\n" | tac
This also works :)