I have two lists:
list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']
list2 = ['abc', 'hij']
I would like to subset list1 such that: 1) only those elements with substrings matching an element in list2 are retained, and 2) for duplicated elements that meet the first requirement, I want to randomly retain only one of the duplicates. For this specific example, I would like to produce a result such as:
['abc-21-6/7', 'hij-75-1/7']
I have worked out code to meet my first requirement:
[ele for ele in list1 for x in list2 if x in ele]
Which, based on my specific example, returns the following:
['abc-21-6/7', 'abc-56-9/10', 'hij-2-4/9', 'hij-75-1/7']
But I am stuck on the second step - how to randomly retain only one element in the case of duplicate substrings. I'm wondering if the random.choice function can somehow be incorporated into this problem? Any advice will be greatly appreciated!
You can use itertools.groupby
:
import itertools
import random
list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']
list2 = ['abc', 'hij']
new_list1 = [i for i in list1 if any(b in i for b in list2)]
new_data = [list(b) for a, b in itertools.groupby(new_list1, key=lambda x: x.split("-")[0])]
final_data = [random.choice(i) for i in new_data]
Output:
['abc-56-9/10', 'hij-75-1/7']
You can use the following function:
def find(list1, findable):
for element in list1:
if findable in element:
return element
Now we can use a list comprehension:
[find(list1, ele) for ele in list2 if find(list1, ele) is not None]
This can be sped up without the list comprehension:
result = []
for ele in list2:
found = find(list1, ele)
if found is not None:
result.append(found)
You can use a dictionary instead of a list, and then convert the values to a list.
list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']
list2 = ['abc', 'hij']
final_list = {pref:ele for pref in list2 for ele in list1 if pref in ele}
final_list = list(final_list.values())
this would output:
>>>final_list
['abc-56-9/10', 'hij-75-1/7']