How do you extract IP addresses from files using a

How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to extract those IP addresses using common unix command-line tools?

标签： regex linux bash unix command-line

17条回答

冷血范

2楼-- · 2019-01-16 04:29

All of the previous answers have one or more problems. The accepted answer allows ip numbers like 999.999.999.999. The currently second most upvoted answer requires prefixing with 0 such as 127.000.000.001 or 008.008.008.008 instead of 127.0.0.1 or 8.8.8.8. Apama has it almost right, but that expression requires that the ipnumber is the only thing on the line, no leading or trailing space allowed, nor can it select ip's from the middle of a line.

I think the correct regex can be found on http://www.regextester.com/22

So if you want to extract all ip-adresses from a file use:

grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt

If you don't want duplicates use:

grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt | sort | uniq

Please comment if there still are problems in this regex. It easy to find many wrong regex for this problem, I hope this one has no real issues.

0人赞添加讨论(0) 举报

聊天终结者

3楼-- · 2019-01-16 04:30

You could use grep to pull them out.

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

0人赞添加讨论(0) 举报

等我变得足够好

4楼-- · 2019-01-16 04:31

I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.

EDIT: Just to make it more like a complete program, you could do something like the following (not tested):

#!/usr/bin/perl -w
use strict;

while (<>) {
    if (/(\d+\.\d+\.\d+\.\d+)/) {
        print "$1\n";
    }
}

This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretut gives you a more detailed tutorial on regular expressions.

0人赞添加讨论(0) 举报

可以哭但决不认输i

5楼-- · 2019-01-16 04:31

I wrote an informative blog article about this topic: How to Extract IPv4 and IPv6 IP Addresses from Plain Text Using Regex.

In the article there's a detailed guide of the most common different patterns for IPs, often required to be extracted and isolated from plain text using regular expressions.
This guide is based on CodVerter's IP Extractor source code tool for handling IP addresses extraction and detection when necessary.

If you wish to validate and capture IPv4 Address this pattern can do the job:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

or to validate and capture IPv4 Address with Prefix ("slash notation"):

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/[0-9]{1,2})\b

or to capture subnet mask or wildcard mask:

(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)

or to filter out subnet mask addresses you do it with regex negative lookahead:

\b((?!(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)))(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

For IPv6 validation you can go to the article link I have added at the top of this answer.
Here is an example for capturing all the common patterns (taken from CodVerter`s IP Extractor Help Sample):

0人赞添加讨论(0) 举报

Evening l夕情丶

6楼-- · 2019-01-16 04:33

I usually start with grep, to get the regexp right.

# [multiple failed attempts here]
grep    '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'                 file  # good?
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file  # good enough

Then I'd try and convert it to sed to filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to use grep -o instead)

sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*/\1/p  # FAIL

That's when I usually get annoyed with sed for not using the same regexes as anyone else. So I move to perl.

$ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&'

Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:

$ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s)

0人赞添加讨论(0) 举报

我命由我不由天

7楼-- · 2019-01-16 04:33

You can use some shell helper I made: https://github.com/philpraxis/ipextract

included them here for convenience:

#!/bin/sh
ipextract () 
{ 
egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' 
}

ipextractnet ()
{ 
egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+' 
}

ipextracttcp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/tcp' 
}

ipextractudp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/udp' 
}

ipextractsctp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/sctp' 
}

ipextractfqdn ()
{ 
egrep --only-matching -E  '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}' 
}

Load it / source it (when stored in ipextract file) from shell:

$ . ipextract

Use them:

$ ipextract < /etc/hosts
127.0.0.1
255.255.255.255
$

For some example of real use:

ipextractfqdn < /var/log/snort/alert | sort -u
dmesg | ipextractudp

0人赞添加讨论(0) 举报

1 2 3 下一页

How do you extract IP addresses from files using a

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间