Efficiently identifying whether part of string is

I have a lot (>100,000) lowercase strings in a list, where a subset might look like this:

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

I further have a dict like this (in reality this is going to have a length of around ~1000):

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

For all strings in the list which contain any of the dict's keys, I want to replace the entire string with the corresponding dict value. The expected result should thus be:

str_list = ["dk", "us", "nothing here"]

What is the most efficient way to do this given the number of strings I have and the length of the dict?

Extra info: There is never more than one dict key in a string.

标签： python string list match

5条回答

ら.Afraid

2楼-- · 2019-06-24 15:06

Something like this would work. Note that this will convert the string to the first encountered key fitting the criteria. If there are multiple you may want to modify the logic based on whatever fits your use case.

strings = [str1, str2, str3]
converted = []
for string in strings:
    updated_string = string
    for key, value in dict_x.items()
        if key in string:
            updated_string = value
            break
    converted.append(updated_string)
print(converted)

0人赞添加讨论(0) 举报

淡お忘

3楼-- · 2019-06-24 15:17

You can subclass dict and use a list comprehension.

In terms of performance, I advise you try a few different methods and see what works best.

class dict_contains(dict):
    def __getitem__(self, value):
        key = next((k for k in self.keys() if k in value), None)
        return self.get(key)

str1 = "hello i am from denmark"
str2 = "that was in the united states"
str3 = "nothing here"

lst = [str1, str2, str3]

dict_x = dict_contains({"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"})

res = [dict_x[i] or i for i in lst]

# ['dk', 'us', "nothing here"]

0人赞添加讨论(0) 举报

疯言疯语

4楼-- · 2019-06-24 15:18

Assuming:

lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

You can do:

res = [dict_x.get(next((k for k in dict_x if k in my_str), None), my_str) for my_str in lst]

which returns:

print(res)  # -> ['dk', 'us', 'nothing here']

The cool thing about this (apart from it being a python-ninjas favorite weapon aka list-comprehension) is the get with a default of my_str and next with a StopIteration value of None that triggers the above default.

0人赞添加讨论(0) 举报

萌系小妹纸

5楼-- · 2019-06-24 15:19

This seems to be a good way:

input_strings = ["hello i am from denmark",
                 "that was in the united states",
                 "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

output_strings = []

for string in input_strings:
    for key, value in dict_x.items():
        if key in string:
            output_strings.append(value)
            break
    else:
        output_strings.append(string)
print(output_strings)

0人赞添加讨论(0) 举报

成全新的幸福

6楼-- · 2019-06-24 15:20

Try

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

for k, v in dict_x.items():
    for i in range(len(str_list)):
        if k in str_list[i]:
            str_list[i] = v

print(str_list)

This iterates through the key, value pairs in your dictionary and looks to see if the key is in the string. If it is, it replaces the string with the value.

0人赞添加讨论(0) 举报

Efficiently identifying whether part of string is

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间