I'm having trouble finding the correct regular expression for the scenario below:
Lets say:
a = "this is a sample"
I want to match whole word - for example match "hi"
should return False since "hi"
is not a word and "is"
should return True since there is no alpha character on the left and on the right side.
I think that the behavior desired by the OP was not completely achieved using the answers given. Specifically, the desired output of a boolean was not accomplished. The answers given do help illustrate the concept, and I think they are excellent. Perhaps I can illustrate what I mean by stating that I think that the OP used the examples used because of the following.
The string given was,
The OP then stated,
As I understand, the reference is to the search token,
"hi"
as it is found in the word,"this"
. If someone were to search the string,a
for the word"hi"
, they should receiveFalse
as the response.The OP continues,
In this case, the reference is to the search token
"is"
as it is found in the word"is"
. I hope this helps clarify things as to why we use word boundaries. The other answers have the behavior of "don't return a word unless that word is found by itself -- not inside of other words." The "word boundary" shorthand character class does this job nicely.Only the word
"is"
has been used in examples up to this point. I think that these answers are correct, but I think that there is more of the question's fundamental meaning that needs to be addressed. The behavior of other search strings should be noted to understand the concept. In other words, we need to generalize the (excellent) answer by @georg usingre.match(r"\bis\b", your_string)
The samer"\bis\b"
concept is also used in the answer by @OmPrakash, who started the generalizing discussion by showingLet's say the method which should exhibit the behavior I've discussed is named
The following behavior should then be expected.
Once again, this is how I understand the OP's question. We have a step towards that behavior with the answer from @georg , but it's a little hard to interpret/implement. to wit
There is no output from the second command. The useful answer from @OmPrakesh shows output, but not
True
orFalse
.Here's a more complete sampling of the behavior to be expected.
This can be accomplished by the following code:
A simple demonstration follows. Run the Python interpreter from the same directory where you saved the file,
find_only_whole_word.py
.The trouble with regex is that if hte string you want to search for in another string has regex characters it gets complicated. any string with brackets will fail.
This code will find a word
The first part of the conditional searches for the text with a space on each side and the second part catches the end of string situation. Note that the endwith is boolean whereas the
find
returns an integerTry using the "word boundary" character class in the regex module,
re
:From the documentation of
re.search()
.Try
From the docs:
Note that the
re
module uses a naive definition of "word" as a "sequence of alphanumeric or underscore characters", where "alphanumeric" depends on locale or unicode options.Also note that without the raw string prefix,
\b
is seen as "backspace" instead of regex word boundary.