Simultaneous .replace functionality

2019-01-26 11:02发布

问题:

I have already converted user input of DNA code (A,T,G,C) into RNA code(A,U,G,C). This was fairly easy

RNA_Code=DNA_Code.replace('T','U')

Now the next thing I need to do is convert the RNA_Code into it's compliment strand. This means I need to replace A with U, U with A, G with C and C with G, but all simultaneously.

if I say

RNA_Code.replace('A','U')
RNA_Code.replace('U','A')

it converts all the As into Us then all the Us into As but I am left with all As for both.

I need it to take something like AUUUGCGGCAAA and convert it to UAAACGCCGUUU.

Any ideas on how to get this done?(3.3)

回答1:

Use a translation table:

RNA_compliment = {
    ord('A'): 'U', ord('U'): 'A',
    ord('G'): 'C', ord('C'): 'G'}

RNA_Code.translate(RNA_compliment)

The str.translate() method takes a mapping from codepoint (a number) to replacement character. The ord() function gives us a codepoint for a given character, making it easy to build your map.

Demo:

>>> RNA_compliment = {ord('A'): 'U', ord('U'): 'A', ord('G'): 'C', ord('C'): 'G'}
>>> 'AUUUGCGGCAAA'.translate(RNA_compliment)
'UAAACGCCGUUU'


回答2:

You can use a mapping dictionary:

In [1]: dic={"A":"U","U":"A","G":"C","C":"G"}

In [2]: strs="AUUUGCGGCAAA"

In [3]: "".join(dic[x] for x in strs)
Out[3]: 'UAAACGCCGUUU'


回答3:

If you're not already using it, I suggest trying out Biopython. It has all sorts of functions for dealing with biological data, including a pretty cool Seq object. There is a reverse_complement() function that does exactly what you're trying to do, and a bunch more that you might not even have thought of yet. Check it out, it's a real time-saver.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("AGTACACTGGT", generic_dna)
>>> my_dna
Seq('AGTACACTGGT', DNAAlphabet())
>>> my_dna.complement()
Seq('TCATGTGACCA', DNAAlphabet())
>>> my_dna.reverse_complement()
Seq('ACCAGTGTACT', DNAAlphabet())


回答4:

I have a simple solution:

# get the sequence from the user:

dna_seq = input("Please enter your sequence here: ")

# make a for loop to read the seq one nucleotide at a time and add each one in a new variable

compliment = ""

for n in dna_seq:

    if n == "A":
        compliment = compliment + "T"
    elif n == "T":
        compliment = compliment + "A"
    elif n == "G":
        compliment = compliment + "C"
    elif n == "C":
        compliment = compliment + "G"

print(compliment)