I am trying to match a List containing strings (50 strings) with a list containing strings that are part of some of the strings of the previous list (5 strings). I will post the complete code in order to give context below but I also want to give a short example:
List1 = ['abcd12', 'efgh34', 'ijkl56', 'mnop78']
List2 = ['abc', 'ijk']
I want to return a list of the strings from List1
that have matches in List2
. I have tried to do something with set.intersection
but it seems you can't do partial matches with it (or at I can't with my limited abilities). I also tried any()
but I had no success making it work with my lists. In my book it says I should use a nested loop but I don't know which function I should use and how regarding lists.
Here is the complete code as reference:
#!/usr/bin/env python3.4
# -*- coding: utf-8 -*-
import random
def generateSequences (n):
L = []
dna = ["A","G","C","T"]
for i in range(int(n)):
random_sequence=''
for i in range(50):
random_sequence+=random.choice(dna)
L.append(random_sequence)
print(L)
return L
def generatePrefixes (p, L):
S = [x[:20] for x in L]
D = []
for i in range(p):
randomPrefix = random.choice(S)
D.append(randomPrefix)
return S, D
if __name__ == "__main__":
L = generateSequences(15)
print (L)
S, D = generatePrefixes(5, L)
print (S)
print (D)
edit: As this was flagged as a possible duplicate i want to edit this in order to say that in this post python is used and the other is for R. I don't know R and if there are any similarities but it doesn't look like that to me at first glance. Sorry for the inconvenience.
Using a nested for loop:
This may not be the most efficient way, but it works
Try
You can just compare strings, I remove any duplicates from a result list from list1 that contain list2 items. This basically does it what you want: