perl Script to search for a motif in a multifasta

2019-06-27 09:45发布

问题:

I am able to search a motif in a multi fasta file and print the line containing the motif.... but i need to print all the sequences along with the header line of the motif containing fasta sequence. Please help me i am just a beginner in perl

#!usr/bin/perl -w
use strict;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;


my $line;
open (FILE, "data.fa");
while ($line = <FILE>) {
  if ($line =~ /$motif/)  {
     print $line;
   }
}

回答1:

Try this:

Bio::DB::Fasta

Instructions on the page. For more examples or instructions just search Google for: "use Bio::DB::Fasta"

To install this simply follow any of these instructions, I suggest using the CPAN.pm method as super user:

Installing Perl Modules



回答2:

Your script as written above doesn't remember the current sequence identifiers, so that you don't know which identifier is associated with each sequence.

I've modified your script below to read all of the FASTA sequences into a hash which maps ( identifier => sequence ), then iterate over that hash, printing out matches when appropriate. This will be an inappropriate approach for very large sequence files, but learning how to write little helper functions like this can be a very big speedup when writing new scripts to analyze data. It's also important to understand how to use and manipulate hashes and other data structures in Perl, as most code you encounter won't be written by beginners.

#!/usr/bin/perl

use strict;
use warnings;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;

my %seqs = %{ read_fasta_as_hash( 'data.fa' ) };
foreach my $id ( keys %seqs ) {
    if ( $seqs{$id} =~ /$motif/ ) {
        print $id, "\n";
        print $seqs{$id}, "\n";
    }
}

sub read_fasta_as_hash {
    my $fn = shift;

    my $current_id = '';
    my %seqs;
    open FILE, "<$fn" or die $!;
    while ( my $line = <FILE> ) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_id  = $1;
        } elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $seqs{$current_id} .= $line
        }
    }
    close FILE or die $!;

    return \%seqs;
}


回答3:

@james_thompson's answer is great. I would use that if you're looking for something more versatile. If you're looking for a simpler version (perhaps for teaching?), this would also suffice - though note that this would miss the motif if there's a hard return in the middle.

#!usr/bin/perl -w
use strict;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;


my $line;
my $defline;
open (FILE, "data.fa");
while ($line = <FILE>) {
  if ($line =~ /^>/) {
     $defline = $line;
   } elsif ($line =~ /$motif/)  {
     print($defline,$line);
   }
}
close (FILE);

You'll note I also added an explicit close on the file handle.