I have a .csv file like this:
stack2@example.com,2009-11-27 01:05:47.893000000,example.net,127.0.0.1
overflow@example.com,2009-11-27 00:58:29.793000000,example.net,255.255.255.0
overflow@example.com,2009-11-27 00:58:29.646465785,example.net,256.255.255.0
...
I have to remove duplicate e-mails (the entire line) from the file (i.e. one of the lines containing overflow@example.com
in the above example). How do I use uniq
on only field 1 (separated by commas)? According to man
, uniq
doesn't have options for columns.
I tried something with sort | uniq
but it doesn't work.
If you want to retain the last one of the duplicates you could use
Which was my requirement
here
tac
will reverse the file line by lineHere is a very nifty way.
First format the content such that the column to be compared for uniqueness is a fixed width. One way of doing this is to use awk printf with a field/column width specifier ("%15s").
Now the -f and -w options of uniq can be used to skip preceding fields/columns and to specify the comparison width (column(s) width).
Here are three examples.
In the first example...
1) Temporarily make the column of interest a fixed width greater than or equal to the field's max width.
2) Use -f uniq option to skip the prior columns, and use the -w uniq option to limit the width to the tmp_fixed_width.
3) Remove trailing spaces from the column to "restore" it's width (assuming there were no trailing spaces beforehand).
In the second example...
Create a new uniq column 1. Then remove it after the uniq filter has been applied.
The third example is the same as the second, but for multiple columns.