How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to extract those IP addresses using common unix command-line tools?
相关问题
- Is shmid returned by shmget() unique across proces
- how to get running process information in java?
- JQ: Select when attribute value exists in a bash a
- Error building gcc 4.8.3 from source: libstdc++.so
- Why should we check WIFEXITED after wait in order
All of the previous answers have one or more problems. The accepted answer allows ip numbers like 999.999.999.999. The currently second most upvoted answer requires prefixing with 0 such as 127.000.000.001 or 008.008.008.008 instead of 127.0.0.1 or 8.8.8.8. Apama has it almost right, but that expression requires that the ipnumber is the only thing on the line, no leading or trailing space allowed, nor can it select ip's from the middle of a line.
I think the correct regex can be found on http://www.regextester.com/22
So if you want to extract all ip-adresses from a file use:
If you don't want duplicates use:
Please comment if there still are problems in this regex. It easy to find many wrong regex for this problem, I hope this one has no real issues.
You could use grep to pull them out.
I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.
EDIT: Just to make it more like a complete program, you could do something like the following (not tested):
This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretut gives you a more detailed tutorial on regular expressions.
I wrote an informative blog article about this topic: How to Extract IPv4 and IPv6 IP Addresses from Plain Text Using Regex.
In the article there's a detailed guide of the most common different patterns for IPs, often required to be extracted and isolated from plain text using regular expressions.
This guide is based on CodVerter's IP Extractor source code tool for handling IP addresses extraction and detection when necessary.
If you wish to validate and capture IPv4 Address this pattern can do the job:
or to validate and capture IPv4 Address with Prefix ("slash notation"):
or to capture subnet mask or wildcard mask:
or to filter out subnet mask addresses you do it with regex negative lookahead:
For IPv6 validation you can go to the article link I have added at the top of this answer.
Here is an example for capturing all the common patterns (taken from CodVerter`s IP Extractor Help Sample):
I usually start with grep, to get the regexp right.
Then I'd try and convert it to
sed
to filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to usegrep -o
instead)That's when I usually get annoyed with
sed
for not using the same regexes as anyone else. So I move toperl
.Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:
You can use some shell helper I made: https://github.com/philpraxis/ipextract
included them here for convenience:
Load it / source it (when stored in ipextract file) from shell:
Use them:
For some example of real use: