I have a dictionary whose keys come in sets that share the same prefix, like this:
d = { "key1":"valA", "key123":"valB", "key1XY":"valC",
"key2":"valD", "key2-22":"valE" }
Given a query string, I need to look up all the values associated with keys that start with that prefix, e.g. for query="key1"
I need to get ["valA", "valB", "valC"]
My implementation below works but is too slow for a large number of queries since the dictionary d
has about 30,000 keys and most of the keys are more than 20 characters long:
result = [d[s] for s in d.keys() if s.startswith(query)]
Is there a faster/more efficient way to implement this?
The
sortedContainers
lib has a SortedDict implementation, once you have sorted dict you can bisect_left to find where to start, bisect_right to find the last position then use irange to get the keys in the range:Once you maintain a sorteddict using the sorteddict is a lot more efficint:
You could use a suffix tree:
Output
You can avoid producing the intermediate list generated by
dict.keys()
(in python 2.x):But you most likely want to use a trie instead of a dictionary, so you can find all the values associated with a key with a common prefix (a trie is similar to a tree based on prefixes).
Here you can find some different implementation of tries.
Let's compare the timings for the different solutions: