Finding Acronyms Using Regex In Python

2019-02-26 23:44发布

I'm trying to use regex in Python to match acronyms separated by periods. I have the following code:

import re
test_string = "U.S.A."
pattern = r'([A-Z]\.)+'
print re.findall(pattern, test_string)

The result of this is:

['A.']

I'm confused as to why this is the result. I know + is greedy, but why is are the first occurrences of [A-Z]\. ignored?

标签： python regex acronym

2条回答

2楼-- · 2019-02-27 00:40

The (...) in regex creates a group. I suggest changing to:

pattern = r'(?:[A-Z]\.)+'

0人赞添加讨论(0) 举报

3楼-- · 2019-02-27 00:44

Description

This regex will:

(?:(?<=\.|\s)[A-Z]\.)+

enter image description here

Example

Sample Text

This is the U.S.A. we have RADAR.

Matches

U.S.A

0人赞添加讨论(0) 举报