Find substring in string but only if whole words?

What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?

Perhaps an example will demonstrate what I mean:

string1 = "ADDLESHAW GODDARD"
string2 = "ADDLESHAW GODDARD LLP"
assert string_found(string1, string2)  # this is True
string1 = "ADVANCE"
string2 = "ADVANCED BUSINESS EQUIPMENT LTD"
assert not string_found(string1, string2)  # this should be False

How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:

def string_found(string1, string2):
   if string2.find(string1 + " "):
      return True
   return False

But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)

标签： python search string substring

3条回答

We Are One

2楼-- · 2019-01-09 15:11

One approach using the re, or regex, module that should accomplish this task is:

import re

string1 = "pizza pony"
string2 = "who knows what a pizza pony is?"

search_result = re.search(r'\b' + string1 + '\W', string2)

print(search_result.group())

0人赞添加讨论(0) 举报

一夜七次

3楼-- · 2019-01-09 15:12

You can use regular expressions and the word boundary special character \b (highlight by me):

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

def string_found(string1, string2):
   if re.search(r"\b" + re.escape(string1) + r"\b", string2):
      return True
   return False

Demo

If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:

def string_found(string1, string2):
   string1 = " " + string1.strip() + " "
   string2 = " " + string2.strip() + " "
   return string2.find(string1)

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

4楼-- · 2019-01-09 15:16

Here's a way to do it without a regex (as requested) assuming that you want any whitespace to serve as a word separator.

import string

def find_substring(needle, haystack):
    index = haystack.find(needle)
    if index == -1:
        return False
    if index != 0 and haystack[index-1] not in string.whitespace:
        return False
    L = index + len(needle)
    if L < len(haystack) and haystack[L] not in string.whitespace:
        return False
    return True

And here's some demo code (codepad is a great idea: Thanks to Felix Kling for reminding me)

0人赞添加讨论(0) 举报

Find substring in string but only if whole words?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间