sed or awk to delete a block

2019-09-15 07:57发布

问题:

my input file has blocks like

[abc]  
para1=123  
para2=456  
para3=111  

[pqr]  
para1=333    
para2=765    
para3=1345    

[xyz]    
para1=888    
para2=236    
para3=964    

[pqr]    
para1=tyu    
para2=ghj     
para3=ghjk     

[xyz]    
para1=qwe    
para2=asd    
para3=zxc    

Now I need to delete the block which is duplicate using sed or awk. Have to delete the block which we get first from the top of the file. Ex: in above case, we have get the output like

[abc]  
para1=123  
para2=456  
para3=111  

[pqr]    
para1=tyu    
para2=ghj     
para3=ghjk     

[xyz]    
para1=qwe    
para2=asd    
para3=zxc   

回答1:

I do get this from using awk (not sure if you did forget the abc block)

awk '!a[$1]++' RS= ORS="\n\n" file
[abc]
para1=123
para2=456
para3=111

[pqr]
para1=333
para2=765
para3=1345

[xyz]
para1=888
para2=236
para3=964


回答2:

$ cat tst.awk
BEGIN{ RS=""; ORS="\n\n" }
!seen[$1]++ { keys[++numKeys] = $1 }
{ rec[$1] = $0 }
END {
    for (k=1; k<=numKeys; k++) {
        print rec[keys[k]]
    }
}

.

$ awk -f tst.awk file
[abc]
para1=123
para2=456
para3=111

[pqr]
para1=tyu
para2=ghj
para3=ghjk

[xyz]
para1=qwe
para2=asd
para3=zxc


回答3:

This keeps the last instance of each block not the first

 tac file | awk -F"\n" '!x[$NF]++' RS= ORS="\n\n"  |  tac

Slight problem with this method is that as the field separator is a newline the lines have to have the same amount of whitespace after the text as it is counted as the field.
Otherwise should work perfectly :)

 tac file | awk '!x[$(NF-1)]++' RS= ORS="\n\n"  |  tac

This also works :)



标签: bash shell awk sed