I have a lot (>100,000) lowercase strings in a list, where a subset might look like this:
str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]
I further have a dict like this (in reality this is going to have a length of around ~1000):
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
For all strings in the list which contain any of the dict's keys, I want to replace the entire string with the corresponding dict value. The expected result should thus be:
str_list = ["dk", "us", "nothing here"]
What is the most efficient way to do this given the number of strings I have and the length of the dict?
Extra info: There is never more than one dict key in a string.
Something like this would work. Note that this will convert the string to the first encountered key fitting the criteria. If there are multiple you may want to modify the logic based on whatever fits your use case.
You can subclass
dict
and use a list comprehension.In terms of performance, I advise you try a few different methods and see what works best.
Assuming:
You can do:
which returns:
The cool thing about this (apart from it being a python-ninjas favorite weapon aka list-comprehension) is the
get
with a default ofmy_str
andnext
with aStopIteration
value ofNone
that triggers the above default.This seems to be a good way:
Try
This iterates through the key, value pairs in your dictionary and looks to see if the key is in the string. If it is, it replaces the string with the value.