可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to find all the counts (overlapping and non-overlapping) of a sub-string in a string.
I found two answers one of which is using regex which is not my intention and the other was much more in-efficient than I need.
I need something like:
'ababaa'.count('aba') == 2
str.count()
just counts simple substrings. What should I do?
回答1:
def sliding(a, n):
return (a[i:i+n] for i in xrange(len(a) - n + 1))
def substring_count(a, b):
return sum(s == b for s in sliding(a, len(b)))
assert list(sliding('abcde', 3)) == ['abc', 'bcd', 'cde']
assert substring_count('ababaa', 'aba') == 2
回答2:
Does this do the trick?
def count(string, substring):
n = len(substring)
cnt = 0
for i in range(len(string) - n):
if string[i:i+n] == substring:
cnt += 1
return cnt
print count('ababaa', 'aba') # 2
I don't know if there's a more efficient solution, but this should work.
回答3:
count = len(set([string.find('aba',x) for x in range(len(string)) if string.find('aba',x) >= 0]))
回答4:
Here, using re.finditer() is the best way to achieve what you want.
import re
def get_substring_count(s, sub_s):
return sum(1 for m in re.finditer('(?=%s)' % sub_s, s))
get_substring_count('ababaa', 'aba')
# 2 as response
回答5:
Here's a function you could use:
def count(haystack, needle):
return len([x for x in [haystack[i:j+1] for i in xrange(len(haystack)) for j in xrange(i,len(haystack))] if x == needle])
Then:
>>> count("ababaa", "aba")
2
回答6:
Looping through sliced string
def count_substring(string, sub_string):
l = len(sub_string)
n = len(string)
count = sum(1 for i in range(n-l+1) if string[i:i+l].count(sub_string)>0 )
return count
回答7:
A brute-force approach is just
n = len(needle)
count = sum(haystack[i:i+n] == needle for i in range(len(haystack)-n+1))
(this works because in Python True
and False
are equivalent to numbers 1
and 0
for most uses, including math).
Using a regexp instead it could be
count = len(re.findall(needle[:1]+"(?="+re.escape(needle[1:])+")",
haystack))
(i.e. using a(?=ba)
instead of aba
to find overlapping matches too)
回答8:
Another way to consider is by leveraging the Counter container. While the accepted answer is fastest for shorter strings, if you are searching relatively short substrings within long strings the Counter approach starts to take the edge. Also, if you have need to refactor this to perform multiple substring count queries against the same main string, then the Counter approach starts looking much more attractive
For example, searching for a substring of length = 3 gave me the following results using timeit;
Main string length / Accepted Answer / Counter Approach
6 characters / 4.1us / 7.4us
50 characters / 24.4us / 25us
150 characters / 70.7us / 64.9us
1500 characters / 723us / 614us
from collections import Counter
def count_w_overlap(search_string, main_string):
#Split up main_string into all possible overlap possibilities
search_len = len(search_string)
candidates = [main_string[i:i+search_len] for i in range(0, len(main_string) - search_len + 1)]
#Create the Counter container
freq_count = Counter(candidates)
return freq_count[search_string]