Perl like regex in Python

2020-08-17 06:35发布

In Perl I would do something like this for taking different fields in a regexp, separating different fields by () and getting them using $

foreach $line (@lines)
{
 $line =~ m/(.*?):([^-]*)-(.*)/;
  $field_1 = $1
  $field_2 = $2
  $field_3 = $3
}

How could I do something like this in Python?

5条回答
霸刀☆藐视天下
2楼-- · 2020-08-17 06:42

In Perl, you'd be much better off using an array than suffixing a bunch of scalars with numbers. E.g.

foreach my $line ( @lines ) { 
    my @matches = ( $line =~ m/(.*?):([^-]*)-(.*)/ );
    ...
}

In Python, the re module returns a match object containing the capture-group information. So you could write:

match = re.search( '(.*?):([^-]*)-(.*)', line )

Then your matches would be available in match.group(1), match.group(2), etc.

查看更多
神经病院院长
3楼-- · 2020-08-17 06:43

"Canonical" Python translation of your snippet...:

import re

myre = re.compile(r'(.*?):([^-]*)-(.*)')
for line in lines:
    mo = myre.search(line)
    field_1, field_2, field_3 = mo.groups()

Importing re is a must (imports are normally done at the top of a module, but that's not mandatory). Precompiling the RE is optional (if you use the re.search function instead, it will compile your pattern on the fly) but recommended (so you don't rely on the module cache of compiled RE objects for your performance, and also in order to have a RE object and call its methods, which is more common in Python).

You can use either the match method (which always tries matching from the start, whether or not your pattern starts with '^') or the search method (which tries matching anywhere); with your given pattern they should be equivalent (but I'm not 100% sure).

The .groups() method returns all matching groups so you can assign them all in one gulp (using a list in Python, just like using an array in Perl, would probably be more normal, but since you chose to use scalars in Perl you can do the equivalent in Python too).

This will fail with an exception if any line does not match the RE, which is fine if you know they all do match (I'm not sure what's the behavior of your Perl but I think it would "reuse" the previous matching line's values instead, which is peculiar... unless, again you know all lines match;-). If you want to just skip non-matching lines, change the last statement to the following two:

    if mo:
        field_1, field_2, field_3 = mo.groups()
查看更多
男人必须洒脱
4楼-- · 2020-08-17 06:44

Python supports regular expressions with the re module. The re.search() method returns a MatchObject which has methods like group() which you can use to retrieve the "capturing group" information.

For example:

m = re.search(r'(.*?):([^-]*)-(.*)', line)
field_1 = m.group(1)
field_2 = m.group(2)
field_3 = m.group(3)
查看更多
够拽才男人
5楼-- · 2020-08-17 06:56

Just as an alternative example, python provides very nice support for named capture groups (in fact python pioneered support for named capture groups).

To use a named capture group, you just add ?P<the_name_of_the_group> inside the opening parenthesis of the capture group.

This allows you to get all of your matches in a dictionary very easily:

>>> import re
>>> x = re.search("name: (?P<name>\w+) age: (?P<age>\d+)", "name: Bob age: 20")
>>> x.groupdict()
{'age': '20', 'name': 'Bob'}

Here's the OP's example, modified to use named capture groups

import re

find_fields_regex = re.compile(r'(?P<field1>.*?):(?P<field2>[^-]*)-(?P<field3>.*)')
for line in lines:
    search_result = find_fields_regex.search(line)
    all_the_fields = search_result.groupdict()

Now all_the_fields is a dictionary with keys corresponding to the capture group names ("field1", "field2", and "field3") and the values corresponding to the contents of the respective capture groups.

Why you should prefer named capture groups

  • With named capture groups, it doesn't matter if you modify the regex pattern to add more capture groups or remove existing capture groups, everything still gets put into the dictionary under the correct keys. But without named capture groups, you have to double check your variable assignments every time the number of groups changes.
  • Named capture groups make your capture groups self-documenting.
  • You can still use numbers to refer to the groups if you want:
>>> import re
>>> x = re.search("name: (?P<name>\w+) age: (?P<age>\d+)", "name: Bob age: 20")
>>> x.groupdict()
{'age': '20', 'name': 'Bob'}
>>> x.group(1)
'Bob'
>>> x.group(2)
'20'

Some good regex resources:

查看更多
The star\"
6楼-- · 2020-08-17 07:01

And don't forget that in Python, TIMTOWTDI ;)

import re
p = re.compile(r'(\d+)\.(\d+)')
num_parts = p.findall('11.22   333.444') # List of tuples.
print num_parts                          # [('11', '22'), ('333', '444')]
查看更多
登录 后发表回答