The file content is as follows:
333379266 834640619 88
333379280 834640621 99
333379280 834640621 66
333376672 857526666 99
333376672 857526666 78
333376672 857526666 62
The first two columns may be duplicate, and I want to output the first two columns and the corresponding max value of the third column.In this case,The result file should be as follows:
333379266 834640619 88
333379280 834640621 99
333376672 857526666 99
My attemp is:
awk '{d[$1" "$2]=$3;if ($3>=d[$1" "$2]){num[$1" "$2]=$3} else{num[$1" "$2]=d[$1" "$2]} }END{for(i in num) print i,num[i]}'
But it does not work,because $3>=d[$1" "$2]
is always right , the value of num is always $3
, and awk
reads the file line by line,so the value of num
is always the last one,not the max one.
I'll be appreciated if anyone can give me the solution.Thanks in advance.
This one liner applied the same idea as your codes, the only difference is, using
FS
instead of space.Could you please try following.
Issues with OP's code:
On your line
d[$1" "$2]=$3;if ($3>=d[$1" "$2])
; since you are assigning array d's value before comparison to current line's 3rd field so your this condition is always going to be true is what I could see major issue in OP's attempt.OP's attempt fix: IMHO my solution above should be good but trying to fix OP's attempt here.