I'm trying to learn how to use regular expressions but have a question. Let's say I have the string
line = 'Cow Apple think Woof`
I want to see if line
has at least two words that begin with capital letters (which, of course, it does). In Python, I tried to do the following
import re
test = re.search(r'(\b[A-Z]([a-z])*\b){2,}',line)
print(bool(test))
but that prints False
. If I instead do
test = re.search(r'(\b[A-Z]([a-z])*\b)',line)
I find that print(test.group(1))
is Cow
but print(test.group(2))
is w
, the last letter of the first match (there are no other elements in test.group
).
Any suggestions on pinpointing this issue and/or how to approach the problem better in general?
The last letter of the match is in group because of inner parentheses. Just drop those and you'll be fine.
The count of capitalised words is, of course,
len(t)
.If there is a period at the end of an email ID (abc@some,com.), it will be returned at the end of the email address. However, this can be dealt separately.
I use the
findall
function to find all instances that match the regex. The uselen
to see how many matches there are, in this case, it prints out3
. You can check if the length is greater than 2 and return aTrue
orFalse
.If you want to use only regex, you can search for a capitalized word then some characters in between and another capitalized word.
(\b[A-Z][a-z]*\b)
- finds a capitalized word(.*)
- matches 0 or more characters(\b[A-Z][a-z]*\b)
- finds the second capitalized wordThis method isn't as dynamical since it will not work for trying to match 3 capitalized word.