I learned a really handy way to remove duplicate lines retaining the order from Remove duplicates without sorting file - BASH.
Say, if you have the following file,
$cat file
a
a
b
b
a
c
you can use the following to remove the duplicate lines:
$awk '!x[$1]++' file
a
b
c
How does this work in terms of precedence of operations?
In AWK arrays are associative, so the first column or first field of each line,
$1
, is used as an index for the arrayx
.The expression is parsed as
So, from the inside out, it's:
$(1)
(note that$
is an operator in AWK, unlike in Perl).x
with the value of field 1; ifx
is an unbound variable, bind it to a new associative array.x[$(1)]
; a rule similar to the one in C applies, so the value of the expression is that ofx[$(1)]
prior to the increment, which will be zero ifx[$(1)]
has not yet been assigned a value.x[$(1)]
is zero.x[$(1)]
gets a non-zero value. So, the next time,x[$(1)]
for the same value of$(1)
will return 1.This expression is then evaluated for every line in the input and determines whether the implied default action of
awk
should be executed, which is to echo the line tostdout
.