I am trying to run pairwise global alignment method in biopython
in loop for about 10000 pair of strings. Each string on an average is 20 characters long. Running the method for a single pair of sequences works fine. But running this in a loop, for as low as 4 pairs, results in segmentation fault. How can this be solved?
from Bio import pairwise2
def myTrial(source,targ):
if source == targ:
return [source,targ,source]
alignments = pairwise2.align.globalmx(source, targ,1,-0.5)
return alignments
sour = ['najprzytulniejszy', 'sadystyczny', 'wyrzucić', 'świat']
targ = ['najprzytulniejszym', 'sadystycznemu', 'wyrzucisz', 'świat']
for i in range(4):
a = myTrial(sour[i],targ[i])
The segmentation fault isn't happening because you are using a loop, but because you are providing non-ASCII characters as input for an alignment mode that takes ASCII string inputs only. Luckily,
Bio.pairwise2.align.globalmx
also permits aligning lists that contain arbitrary strings of ASCII and non-ASCII characters as tokens(i.e. aligning lists of strings, such as['ABC', 'ABD']
with['ABC', 'GGG']
to produce alignments likeor in your case, aligning lists of non-ASCII characters such as
['ś', 'w', 'i', 'a', 't']
and['w', 'y', 'r', 'z', 'u', 'c', 'i', 's', 'z']
to produce alignments likeTo accomplish this with Biopython, in your code, replace
with
So for an input of
the modified code will produce
instead of a segmentation fault.
And since each token in the list is only one character long, you can also convert the resulting aligned lists back into strings using:
In the above example,
new_alignment
would then beas desired.