What is the best way to split a string like "HELLO there HOW are YOU"
by upper case words (in Python)?
So I'd end up with an array like such: results = ['HELLO there', 'HOW are', 'YOU']
EDIT:
I have tried:
p = re.compile("\b[A-Z]{2,}\b")
print p.split(page_text)
It doesn't seem to work, though.
I suggest
l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)
Check this demo.
You could use a lookahead:
re.split(r'[ ](?=[A-Z]+\b)', input)
This will split at every space that is followed by a string of upper-case letters which end in a word-boundary.
Note that the square brackets are only for readability and could as well be omitted.
If it is enough that the first letter of a word is upper case (so if you would want to split in front of Hello
as well) it gets even easier:
re.split(r'[ ](?=[A-Z])', input)
Now this splits at every space followed by any upper-case letter.
You don't need split, but rather findall:
re.findall(r'[A-Z]+[^A-Z]*', str)
Your question contains the string literal "\b[A-Z]{2,}\b"
,
but that \b
will mean backspace, because there is no r-modifier.
Try: r"\b[A-Z]{2,}\b"
.