I am able to search a motif in a multi fasta file and print the line containing the motif.... but i need to print all the sequences along with the header line of the motif containing fasta sequence. Please help me i am just a beginner in perl
#!usr/bin/perl -w
use strict;
print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;
my $line;
open (FILE, "data.fa");
while ($line = <FILE>) {
if ($line =~ /$motif/) {
print $line;
}
}
Try this:
Bio::DB::Fasta
Instructions on the page. For more examples or instructions just search Google for: "use Bio::DB::Fasta"
To install this simply follow any of these instructions, I suggest using the CPAN.pm method as super user:
Installing Perl Modules
@james_thompson's answer is great. I would use that if you're looking for something more versatile. If you're looking for a simpler version (perhaps for teaching?), this would also suffice - though note that this would miss the motif if there's a hard return in the middle.
You'll note I also added an explicit close on the file handle.
Your script as written above doesn't remember the current sequence identifiers, so that you don't know which identifier is associated with each sequence.
I've modified your script below to read all of the FASTA sequences into a hash which maps ( identifier => sequence ), then iterate over that hash, printing out matches when appropriate. This will be an inappropriate approach for very large sequence files, but learning how to write little helper functions like this can be a very big speedup when writing new scripts to analyze data. It's also important to understand how to use and manipulate hashes and other data structures in Perl, as most code you encounter won't be written by beginners.