Regex - finding capital words in string

2020-07-09 06:51发布

I'm trying to learn how to use regular expressions but have a question. Let's say I have the string

line = 'Cow Apple think Woof`

I want to see if line has at least two words that begin with capital letters (which, of course, it does). In Python, I tried to do the following

import re
test = re.search(r'(\b[A-Z]([a-z])*\b){2,}',line)
print(bool(test))

but that prints False. If I instead do

test = re.search(r'(\b[A-Z]([a-z])*\b)',line)

I find that print(test.group(1)) is Cow but print(test.group(2)) is w, the last letter of the first match (there are no other elements in test.group).

Any suggestions on pinpointing this issue and/or how to approach the problem better in general?

标签: python regex
3条回答
够拽才男人
2楼-- · 2020-07-09 07:19

The last letter of the match is in group because of inner parentheses. Just drop those and you'll be fine.

>>> t = re.findall('([A-Z][a-z]+)', line)
>>> t
['Cow', 'Apple', 'Woof']
>>> t = re.findall('([A-Z]([a-z])+)', line)
>>> t
[('Cow', 'w'), ('Apple', 'e'), ('Woof', 'f')]

The count of capitalised words is, of course, len(t).

查看更多
霸刀☆藐视天下
3楼-- · 2020-07-09 07:36
import re

sent = "His email is abc@some.com, however his wife uses xyz@gmail.com"

x = re.findall('[A-Za-z]+@[A-Za-z\.]+', sent)

print(x)

If there is a period at the end of an email ID (abc@some,com.), it will be returned at the end of the email address. However, this can be dealt separately.

查看更多
等我变得足够好
4楼-- · 2020-07-09 07:38

I use the findall function to find all instances that match the regex. The use len to see how many matches there are, in this case, it prints out 3. You can check if the length is greater than 2 and return a True or False.

import re

line = 'Cow Apple think Woof'

test = re.findall(r'(\b[A-Z]([a-z])*\b)',line)
print(len(test) >= 2)

If you want to use only regex, you can search for a capitalized word then some characters in between and another capitalized word.

test = re.search(r'(\b[A-Z][a-z]*\b)(.*)(\b[A-Z][a-z]*\b)',line)
print(bool(test))
  • (\b[A-Z][a-z]*\b) - finds a capitalized word
  • (.*) - matches 0 or more characters
  • (\b[A-Z][a-z]*\b) - finds the second capitalized word

This method isn't as dynamical since it will not work for trying to match 3 capitalized word.

查看更多
登录 后发表回答