I have a hash called %id2seq
that contains strings of DNA sequences that are referenced by the key $id
. I want to be able to manipulate the DNA sequences by using a position within the string as a reference. For example, if my DNA sequence was ACGTG
, my $id
would be Sequence 1
, my $id2seq{'Sequence 1'}
would be ACGTG
, and my "theoretical" $id2seq{'Sequence 1'}[3]
would be G
.
I am attempting to create a hash of arrays to do this, but I'm getting a weird output (see below output). I'm pretty sure that it's just my formatting Any input is helpful, and I appreciate in advance.
Here is a snippet of the input file:
>Sequence 1
TCAGAACCAGTTATAAATTTATCATTTCCTTCTCCACTCCT
>Sequence 2
CCCACGCAGCCGCCCTCCTCCCCGGTCACTGACTGGTCCTG
>Sequence 3
TCGACCCTCTGGAACCTATCAGGGACCACAGTCAGCCAGGCAAG
Here is a snippet of my attempt at the moment. (I have a hash table that accesses a file with the DNA sequences commented out):
use strict;
use warnings;
print "Please enter the filename of the fasta sequence data: ";
my $filename1 = <STDIN>;
#Remove newline from file
chomp $filename1;
#Open the file and store each dna seq in hash
my %id2seq = ();
my $id = '';
open (FILE, '<', $filename1) or die "Cannot open $filename1.",$!;
my $dna;
while (<FILE>)
{
if($_ =~ /^>(.+)/)
{
$id = $1;
}
else
{
## $id2seq{$id} = $_; used to create hash table
@seqs = split '', $_;
$id2seq{$id} = [ @seqs ];
}
}
close FILE;
foreach $id (keys %id2seq)
{
print "$id2seq{$id}[@seqs]\n\n";
}
Output
Use of unitialized value in concatenation (.) or string at line 37.
T
G
A
T
T
@seqs
contains characters from the last sequence.$id2seq{$id}[@seqs]
actually means$id2seq{$id}[N]
whereN
is the length of the last sequence. So you print only one character from each sequence and get a warning if that sequence is shorter than the last one.If you
print
only for debugging it is easier with:Otherwise you have to iterate over
$id2seq{$id}
yourself in a nested loop.You need to print
To get the fourth value. Also, you never defined @seqs with 'my' so strict and warnings is complaining, thus the 'Use of unitialized value in concatenation (.) or string at line 37.'. Either remove warnings/strict or define @seqs
This line is incorrect:
$id2seq{$id}
is an array ref, so the correct way to print it would beA complete example would be:
Test run: