This question already has an answer here:
- Python re.search 2 answers
I am trying to solve this from problem from Hackerrank. It is a Machine Learning problem. Initially, I tried to read all the words from the Corpus file for building unigram frequencies. According to this ML problem word
is defined as
Word is a sequence of characters containing only letters from
a
toz
(lowercase only) and can contain hyphens (-
) and apostrophe ('
). Word should begin and end with lowercase letters only.
I wrote a regular expression in python like this:
pat = "[a-z]+( ['-]+[a-z]+ ){0,}"
I tried using both re.search()
and re.findall()
. I have problems in both.
Problem with
re.findall()
:string = "HELLO W-O-R-L-D"
output of
re.findall()
:[('Hello', ''), ('W', '-D')]
I couldn't get the word
W-O-R-L-D
. While usingre.search()
, I was able to get it correctlyProblem with
re.search()
:string = "123hello456world789"
output of
re.search()
:'hello'
In this case, when using
re.findall()
, I could get both'hello'
and'world'
.