I have RNA sequences that contain different modified nucleotides and residues. Some of them for example N79, 8XU, SDG, I
.
I want to pairwise align them using biopython's pairwise2.align.localms
. Is it possible to make input not as a string but as list for example in order to accurately account for these modified bases?
What is the correct technique?
Sorry for adding another answer, but my reputation is not good enough for just adding comments...
To elaborate on peterjc's answer, accepting lists as input is the intended behaviour of
pairwise2
(and now I understand what it may be good for...).And you are right, it's also about the
gap_char
argument: Since your are applying the sequence as a list, the gap character must also be defined as a list (["-"]
).Biopython's pairwise2 module works on strings of letters, which can be anything - for example:
You can set the match/mismatch scores according to your needs. However, this assumes each letter is a separate element.
It was not clear in your question if your example N79 was one modified nucleotide, or three? If you wanted to treat N79 as one base it does seem to be possible: I don't think it was intentional (so I wouldn't want to depend on this behaviour), but I could trick pairwise2 into working on lists of strings:
Notice the default format_alignment function does not display this very well.