DNA to RNA and Getting Proteins with Perl

2019-07-04 00:26发布

问题:

I am working on a project(I have to implement it in Perl but I am not good at it) that reads DNA and finds its RNA. Divide that RNA's into triplets to get the equivalent protein name of it. I will explain the steps:

1) Transcribe the following DNA to RNA, then use the genetic code to translate it to a sequence of amino acids

Example:

TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT

2) To transcribe the DNA, first substitute each DNA for it’s counterpart (i.e., G for C, C for G, T for A and A for T):

TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT
AGTATTATGCAAAACATAAGCGGTCGCGAAGCCACA

Next, remember that the Thymine (T) bases become a Uracil (U). Hence our sequence becomes:

AGUAUUAUGCAAAACAUAAGCGGUCGCGAAGCCACA

Using the genetic code is like that

AGU AUU AUG CAA AAC AUA AGC GGU CGC GAA GCC ACA

then look each triplet (codon) up in the genetic code table. So AGU becomes Serine, which we can write as Ser, or just S. AUU becomes Isoleucine (Ile), which we write as I. Carrying on in this way, we get:

SIMQNISGREAT

I will give the protein table:

So how can I write that code in Perl? I will edit my question and write the code that what I did.

回答1:

Try the script below, it accepts input on STDIN (or in file given as parameter) and read it by line. I also presume, that "STOP" in the image attached is some stop state. Hope I read it all well from that picture.

#!/usr/bin/perl
use strict;
use warnings;

my %proteins = qw/
    UUU F UUC F UUA L UUG L UCU S UCC S UCA S UCG S UAU Y UAC Y UGU C UGC C UGG W
    CUU L CUC L CUA L CUG L CCU P CCC P CCA P CCG P CAU H CAC H CAA Q CAG Q CGU R CGC R CGA R CGG R
    AUU I AUC I AUA I AUG M ACU T ACC T ACA T ACG T AAU N AAC N AAA K AAG K AGU S AGC S AGA R AGG R
    GUU V GUC V GUA V GUG V GCU A GCC A GCA A GCG A GAU D GAC D GAA E GAG E GGU G GGC G GGA G GGG G
    /;

LINE: while (<>) {
    chomp;

    y/GCTA/CGAU/; # translate (point 1&2 mixed)

    foreach my $protein (/(...)/g) {
        if (defined $proteins{$protein}) {
            print $proteins{$protein};
        }
        else {
            print "Whoops, stop state?\n";
            next LINE;
        }
    }
    print "\n"
}