While matching an email address, after I match something like yasar@webmail
, I want to capture one or more of (\.\w+)
(what I am doing is a little bit more complicated, this is just an example), I tried adding (.\w+)+ , but it only captures last match. For example, yasar@webmail.something.edu.tr
matches but only include .tr
after yasar@webmail
part, so I lost .something
and .edu
groups. Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
This is what you are looking for:
re
module doesn't support repeated captures (regex
supports it):In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.
You can fix the problem of
(\.\w+)+
only capturing the last match by doing this instead:((?:\.\w+)+)
This will work:
But it's limited to a maximum of six subgroups. A better way to do this would be:
Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.