I need a solution to delete duplicate lines where first field is an IPv4 address.For example I have the following lines in a file:
192.168.0.1/text1/text2
192.168.0.18/text03/text7
192.168.0.15/sometext/sometext
192.168.0.1/text100/ntext
192.168.0.23/othertext/sometext
So all it matches in the previous scenario is the IP address. All I know is that the regex for IP address is:
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
It would be nice if the solution is one line and as fast as possible.
If, the file contains lines only in the format you show, i.e. first field is always IP address, you can get away with 1 line of awk:
awk '!x[$1]++' FS="/" $PATH_TO_FILE
EDIT: This removes duplicates based only on IP address. I'm not sure this is what the OP wanted when I wrote this answer.
If you don't need to preserve the original ordering, one way to do this is using sort
:
sort -u <file>
The awk that ArjunShankar posted worked wonders for me.
I had a huge list of items, which had multiple copies in field 1, and a special sequential number in field 2. I needed the "newest" or highest sequential number from each unique field 1.
I had to use sort -rn to push them up to the "first entry" position, as the first step is write, then compare the next entry, as opposed to getting the last/most recent in the list.
Thank ArjunShankar!