Data manipulation in linux [duplicate]

2019-08-27 14:54发布

问题:

This question already has an answer here:

  • Using awk, remove lines with duplicate pair of columns in different indexes 2 answers

I am trying to filter or remove some lines in a text file based on some criteria (tried with awk, but no success). I have a file that contains some columns separated by a comma ,. An example of such a file is:

source,destination
192.168.1.2,8.8.8.8
8.8.8.8,192.168.1.2

I am interested to remove or filter out those lines where the information is the same.

so if the file contains the reversed source destination:

192.168.1.2,8.8.8.8
8.8.8.8,192.168.1.2

then only show one of the lines, not both.

回答1:

You can try this but be careful if the file is huge as it keeps the values in memory.

awk -F, '!($1 FS $2 in dup){dup[$1 FS $2]=dup[$2 FS $1]; print}' <file>

Same idea :

awk -F, '!(($1 FS $2 in dup)||($2 FS $1 in dup)){dup[$1 FS $2]; print}' <file>


标签: linux csv awk sed