Check if multiple strings exist in another string

2018-12-31 07:50发布

How can I check if any of the strings in an array exists in another string?

Like:

a = ['a', 'b', 'c']
str = "a123"
if a in str:
  print "some of the strings found in str"
else:
  print "no strings found in str"

That code doesn't work, it's just to show what I want to achieve.

12条回答
皆成旧梦
2楼-- · 2018-12-31 08:22

any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.

If you want the first match (with False as a default):

match = next((x for x in a if x in str), False)

If you want to get all matches (including duplicates):

matches = [x for x in a if x in str]

If you want to get all non-duplicate matches (disregarding order):

matches = {x for x in a if x in str}

If you want to get all non-duplicate matches in the right order:

matches = []
for x in a:
    if x in str and x not in matches:
        matches.append(x)
查看更多
旧时光的记忆
3楼-- · 2018-12-31 08:22

You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).

查看更多
浪荡孟婆
4楼-- · 2018-12-31 08:33

I would use this kind of function for speed:

def check_string(string, substring_list):
    for substring in substring_list:
        if substring in string:
            return True
    return False
查看更多
爱死公子算了
5楼-- · 2018-12-31 08:37

jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.

Here is one way to use it in Python:

  1. Download aho_corasick.py from here

  2. Put it in the same directory as your main Python file and name it aho_corasick.py

  3. Try the alrorithm with the following code:

    from aho_corasick import aho_corasick #(string, keywords)
    
    print(aho_corasick(string, ["keyword1", "keyword2"]))
    

Note that the search is case-sensitive

查看更多
一个人的天荒地老
6楼-- · 2018-12-31 08:39

Just to add some diversity with regex:

import re

if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
    print 'possible matches thanks to regex'
else:
    print 'no matches'

or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))

查看更多
爱死公子算了
7楼-- · 2018-12-31 08:39
a = ['a', 'b', 'c']
str =  "a123"

a_match = [True for match in a if match in str]

if True in a_match:
  print "some of the strings found in str"
else:
  print "no strings found in str"
查看更多
登录 后发表回答