IP address/network parsing from text file using py

2019-06-09 03:41发布

问题:

I have the below text file that I would need some help with parsing out IP addresses.

The text file is of the form

abc 10.1.1.1/32   aabbcc
def 11.2.0.0/16   eeffgg
efg 0.0.0.0/0   ddeeff

In other words, a bunch of IP networks exist as part of a log file. The output should be provided as below:

10.1.1.1/32
11.2.0.0/16
0.0.0.0/0

I have the below code but does not output the required information

file = open(filename, 'r')
for eachline in file.readlines():
    ip_regex = re.findall(r'(?:\d{1,3}\.){3}\d{1,3}', eachline)
    print ip_regex

回答1:

First, your regex doesn't even attempt to capture anything but four dotted numbers, so of course it's not going to match anything else, like a /32 on the end. if you just add, e.g., /\d{1,2} to the end, it'll fix that:

(?:\d{1,3}\.){3}\d{1,3}/\d{1,2}

Debuggex Demo


However, if you don't understand regular expressions well enough to understand that, you probably shouldn't be using a regex as a piece of "magic" that you'll never be able to debug or extend. It's a bit more verbose with str methods like split or find, but maybe easier to understand for a novice:

for line in file:
    for part in line.split()
        try:
            address, network = part.split('/')
            a, b, c, d = address.split('.')
        except ValueError:
            pass # not in the right format
        else:
            # do something with part, or address and network, or whatever

As a side note, depending on what you're actually doing with these things, you might want to use the ipaddress module (or the backport on PyPI for 2.6-3.2) rather than string parsing:

>>> import ipaddress
>>> s = '10.1.1.1/32'
>>> a = ipaddress.ip_network('10.1.1.1/32')

You can combine that with either of the above:

for line in file:
    for part in line.split():
        try:
            a = ipaddress.ip_network(part)
        except ValueError:
            pass # not the right format
        else:
            # do something with a and its nifty methods


回答2:

In this particular case, a regex might be overkill, you could use split

with open(filename) as f:
    ipList = [line.split()[1] for line in f]

This should produce a list of strings, which are the ip addresses.



标签: python regex ip