I'm looking for the best most efficient way to match the end of a single string with a value from a predefined list of strings.
Something like
my_str='QWERTY'
my_lst=['QWE','QQQQ','TYE','YTR','TY']
match='TY'
or match=['TY']
Under the restrictions
len(my_lst)
is known but arbitrary thus could be very long, probably around 30
elements in my_lst
may have different len
so I can't just check a defined last portion of my_str
every time
for my_str
as well as the matching elements in my_lst
they can be either strings or lists, whichever is more efficient (see background)
len(my_str)
is mostly small, no longer than 8 characters
in
function wouldn't do as I need the matching to occur exclusively at the end.
endswith
is no use on it's own since it would only return
a Boolean
the match should always be unique or []
as no elements in my_lst
would share ending with one another
little background may skip
I started with this problem as a list problem such as ['Q','W','E','R','T','Y']
where I would have a list of lists of 1 character strings for the matching and I was thinking of running a reverse iteration as [::-1]
for the checking for every candidate.
Then I realized it was possible to concatenate the inner lists since they contained only strings and run the same logic on the resulting strings.
Finally I came across the endswith
string method reading this question but it wasn't quite what I needed. Furthermore my problem can't be generalized to be solved with os
module or similar since it's a string problem, not a pathing one.
end of background
I made my approach in this two ways
match=filter(lambda x: my_str.endswith(x), my_lst)
match=[x for x in my_lst if my_str.endswith(x)]
I succeeded but I would like to know if there is some built-in or best way to find and return the matched ending value.
Thanks.
Here's a way using a trie, or prefix tree (technically a suffix tree in this situation). If we had three potential suffixes
CA
,CB
, andBA
, our suffix tree would look like(
e
is the empty string) We start at the end of the input string and consume characters. If we run across the beginning of the string or a character that is not a child of the current node, then we reject the string. If we reach a leaf of the tree, then we accept the string. This lets us scale better to very many potential suffixes.gives us a trie of
If you want to return the matching suffix, that's just a matter of tracking the characters we see as we descendt he trie.
It's worth noting that the empty trie can be reached by both
build_trie([''])
andbuild_trie([])
, and matches the empty string at the end of all strings. To avoid this, you could check the length ofsuffixes
and return some non-dict value, which you would check against inhas_suffix