Is there a way to 'uniq' by column?

2019-01-04 16:32发布

I have a .csv file like this:

stack2@example.com,2009-11-27 01:05:47.893000000,example.net,127.0.0.1
overflow@example.com,2009-11-27 00:58:29.793000000,example.net,255.255.255.0
overflow@example.com,2009-11-27 00:58:29.646465785,example.net,256.255.255.0
...

I have to remove duplicate e-mails (the entire line) from the file (i.e. one of the lines containing overflow@example.com in the above example). How do I use uniq on only field 1 (separated by commas)? According to man, uniq doesn't have options for columns.

I tried something with sort | uniq but it doesn't work.

8条回答
你好瞎i
2楼-- · 2019-01-04 17:12

If you want to retain the last one of the duplicates you could use

 tac a.csv | sort -u -t, -r -k1,1 |tac

Which was my requirement

here

tac will reverse the file line by line

查看更多
Evening l夕情丶
3楼-- · 2019-01-04 17:12

Here is a very nifty way.

First format the content such that the column to be compared for uniqueness is a fixed width. One way of doing this is to use awk printf with a field/column width specifier ("%15s").

Now the -f and -w options of uniq can be used to skip preceding fields/columns and to specify the comparison width (column(s) width).

Here are three examples.

In the first example...

1) Temporarily make the column of interest a fixed width greater than or equal to the field's max width.

2) Use -f uniq option to skip the prior columns, and use the -w uniq option to limit the width to the tmp_fixed_width.

3) Remove trailing spaces from the column to "restore" it's width (assuming there were no trailing spaces beforehand).

printf "%s" "$str" \
| awk '{ tmp_fixed_width=15; uniq_col=8; w=tmp_fixed_width-length($uniq_col); for (i=0;i<w;i++) { $uniq_col=$uniq_col" "}; printf "%s\n", $0 }' \
| uniq -f 7 -w 15 \
| awk '{ uniq_col=8; gsub(/ */, "", $uniq_col); printf "%s\n", $0 }'

In the second example...

Create a new uniq column 1. Then remove it after the uniq filter has been applied.

printf "%s" "$str" \
| awk '{ uniq_col_1=4; printf "%15s %s\n", uniq_col_1, $0 }' \
| uniq -f 0 -w 15 \
| awk '{ $1=""; gsub(/^ */, "", $0); printf "%s\n", $0 }'

The third example is the same as the second, but for multiple columns.

printf "%s" "$str" \
| awk '{ uniq_col_1=4; uniq_col_2=8; printf "%5s %15s %s\n", uniq_col_1, uniq_col_2, $0 }' \
| uniq -f 0 -w 5 \
| uniq -f 1 -w 15 \
| awk '{ $1=$2=""; gsub(/^ */, "", $0); printf "%s\n", $0 }'
查看更多
登录 后发表回答