I have a .csv file like this:
stack2@example.com,2009-11-27 01:05:47.893000000,example.net,127.0.0.1
overflow@example.com,2009-11-27 00:58:29.793000000,example.net,255.255.255.0
overflow@example.com,2009-11-27 00:58:29.646465785,example.net,256.255.255.0
...
I have to remove duplicate e-mails (the entire line) from the file (i.e. one of the lines containing overflow@example.com
in the above example). How do I use uniq
on only field 1 (separated by commas)? According to man
, uniq
doesn't have options for columns.
I tried something with sort | uniq
but it doesn't work.
-F
sets the field separator.$1
is the first field._[val]
looks upval
in the hash_
(a regular variable).++
increment, and return old value.!
returns logical not.By sorting the file with
sort
first, you can then applyuniq
.It seems to sort the file just fine:
You could also do some AWK magic:
well, simpler than isolating the column with awk, if you need to remove everything with a certain value for a given file, why not just do grep -v:
e.g. to delete everything with the value "col2" in the second place line: col1,col2,col3,col4
If this isn't good enough, because some lines may get improperly stripped by possibly having the matching value show up in a different column, you can do something like this:
awk to isolate the offending column: e.g.
the -F sets the field delimited to ",", $2 means column 2, followed by some custom delimiter and then the entire line. You can then filter by removing lines that begin with the offending value:
and then strip out the stuff before the delimiter:
(note -the sed command is sloppy because it doesn't include escaping values. Also the sed pattern should really be something like "[^|]+" (i.e. anything not the delimiter). But hopefully this is clear enough.
or if u want to use uniq:
<mycvs.cvs tr -s ',' ' ' | awk '{print $3" "$2" "$1}' | uniq -c -f2
gives:
To consider multiple column.
Sort and give unique list based on column 1 and column 3:
-t :
colon is separator-k 1,1 -k 3,3
based on column 1 and column 3-u
for unique-t,
so comma is the delimiter-k1,1
for the key field 1Test result: