How to delete duplicated rows based in a column va

2019-02-17 03:17发布

问题:

Given the following table

 123456.451 entered-auto_attendant
 123456.451 duration:76 real:76
 139651.526 entered-auto_attendant
 139651.526 duration:62 real:62`
 139382.537 entered-auto_attendant 

Using a bash shell script based in Linux, I'd like to delete all the rows based on the value of column 1 (The one with the long number). Having into consideration that this number is a variable number

I've tried with

awk '{a[$3]++}!(a[$3]-1)' file

sort -u | uniq

But I am not getting the result which would be something like this, making a comparison between all the values of the first column, delete all the duplicates and show it

 123456.451 entered-auto_attendant
 139651.526 entered-auto_attendant
 139382.537 entered-auto_attendant 

回答1:

you didn't give an expected output, does this work for you?

 awk '!a[$1]++' file

with your data, the output is:

123456.451 entered-auto_attendant
139651.526 entered-auto_attendant
139382.537 entered-auto_attendant

and this line prints only unique column1 line:

 awk '{a[$1]++;b[$1]=$0}END{for(x in a)if(a[x]==1)print b[x]}' file

output:

139382.537 entered-auto_attendant


回答2:

uniq, by default, compares the entire line. Since your lines are not identical, they are not removed.

You can use sort to conveniently sort by the first field and also delete duplicates of it:

sort -t ' ' -k 1,1 -u file
  • -t ' ' fields are separated by spaces
  • -k 1,1: only look at the first field
  • -u: delete duplicates

Additionally, you might have seen the awk '!a[$0]++' trick for deduplicating lines. You can make this dedupe on the first column only using awk '!a[$1]++'.



回答3:

Using awk:

awk '!($1 in a){a[$1]++; next} $1 in a' file
123456.451 duration:76 real:76
139651.526 duration:62 real:62


回答4:

try this command

awk '!x[$1]++ { print $1, $2 }' file