perl Script to search for a motif in a multifasta

I am able to search a motif in a multi fasta file and print the line containing the motif.... but i need to print all the sequences along with the header line of the motif containing fasta sequence. Please help me i am just a beginner in perl

#!usr/bin/perl -w
use strict;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;


my $line;
open (FILE, "data.fa");
while ($line = <FILE>) {
  if ($line =~ /$motif/)  {
     print $line;
   }
}

标签： perl bioinformatics

3条回答

Rolldiameter

2楼-- · 2019-06-27 09:46

Try this:

Bio::DB::Fasta

Instructions on the page. For more examples or instructions just search Google for: "use Bio::DB::Fasta"

To install this simply follow any of these instructions, I suggest using the CPAN.pm method as super user:

Installing Perl Modules

0人赞添加讨论(0) 举报

祖国的老花朵

3楼-- · 2019-06-27 09:50

@james_thompson's answer is great. I would use that if you're looking for something more versatile. If you're looking for a simpler version (perhaps for teaching?), this would also suffice - though note that this would miss the motif if there's a hard return in the middle.

#!usr/bin/perl -w
use strict;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;


my $line;
my $defline;
open (FILE, "data.fa");
while ($line = <FILE>) {
  if ($line =~ /^>/) {
     $defline = $line;
   } elsif ($line =~ /$motif/)  {
     print($defline,$line);
   }
}
close (FILE);

You'll note I also added an explicit close on the file handle.

0人赞添加讨论(0) 举报

ゆ、 Hurt°

4楼-- · 2019-06-27 10:01

Your script as written above doesn't remember the current sequence identifiers, so that you don't know which identifier is associated with each sequence.

I've modified your script below to read all of the FASTA sequences into a hash which maps ( identifier => sequence ), then iterate over that hash, printing out matches when appropriate. This will be an inappropriate approach for very large sequence files, but learning how to write little helper functions like this can be a very big speedup when writing new scripts to analyze data. It's also important to understand how to use and manipulate hashes and other data structures in Perl, as most code you encounter won't be written by beginners.

#!/usr/bin/perl

use strict;
use warnings;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;

my %seqs = %{ read_fasta_as_hash( 'data.fa' ) };
foreach my $id ( keys %seqs ) {
    if ( $seqs{$id} =~ /$motif/ ) {
        print $id, "\n";
        print $seqs{$id}, "\n";
    }
}

sub read_fasta_as_hash {
    my $fn = shift;

    my $current_id = '';
    my %seqs;
    open FILE, "<$fn" or die $!;
    while ( my $line = <FILE> ) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_id  = $1;
        } elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $seqs{$current_id} .= $line
        }
    }
    close FILE or die $!;

    return \%seqs;
}

0人赞添加讨论(0) 举报

perl Script to search for a motif in a multifasta

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间