sed/awk + regex delete duplicate lines where first

2019-04-12 03:17发布

I need a solution to delete duplicate lines where first field is an IPv4 address.For example I have the following lines in a file:

192.168.0.1/text1/text2
192.168.0.18/text03/text7
192.168.0.15/sometext/sometext
192.168.0.1/text100/ntext
192.168.0.23/othertext/sometext

So all it matches in the previous scenario is the IP address. All I know is that the regex for IP address is:

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

It would be nice if the solution is one line and as fast as possible.

标签： awk sed ip match

3条回答

甜甜的少女心

2楼-- · 2019-04-12 03:24

The awk that ArjunShankar posted worked wonders for me.

I had a huge list of items, which had multiple copies in field 1, and a special sequential number in field 2. I needed the "newest" or highest sequential number from each unique field 1.

I had to use sort -rn to push them up to the "first entry" position, as the first step is write, then compare the next entry, as opposed to getting the last/most recent in the list.

Thank ArjunShankar!

0人赞添加讨论(0) 举报

一纸荒年 Trace。

3楼-- · 2019-04-12 03:28

If you don't need to preserve the original ordering, one way to do this is using sort:

sort -u <file>

0人赞添加讨论(0) 举报

劫难

4楼-- · 2019-04-12 03:39

If, the file contains lines only in the format you show, i.e. first field is always IP address, you can get away with 1 line of awk:

awk '!x[$1]++' FS="/" $PATH_TO_FILE

EDIT: This removes duplicates based only on IP address. I'm not sure this is what the OP wanted when I wrote this answer.

0人赞添加讨论(0) 举报

sed/awk + regex delete duplicate lines where first

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间