I have a hash which should contain certain keys which are linked to their own arrays. To be more specific, the hash keys are quality values and the arrays are sequence names. If there already is an array for that quality, I'd want to add the sequence name to the array that is linked to the quality in question. If there isn't one, I want to create one and add the sequence name to it. All this is done in a while loop, going through all the sequences one by one.
I've tried to do things like in Perl How do I retrieve an array from a hash of arrays? but I can't seem to get it right.
I just get these error messages:
Scalar value @{hash{$q} better written as ${hash{$q} at asdasd.pl line 69.
Global symbol "@q" requires explicit package name asdasd.pl line 58.
And some others, too.
Here is an example of what I've tried:
my %hash;
while (reading the sequences) {
my $q = "the value the sequence has";
my $seq = "the name of the sequence";
if (exists $hash{$q}) {
push (@{$hash{$q}}, $seq);
} else {
$hash{$q} = \@q;
$hash{$q} = [$seq];
next;
}
}
This obviously shouldn't be a very complicated problem but I'm new to perl and this kind of a problem feels difficult. I've googled this from various places but there seems to be something I just don't realize, and it might be really obvious, too.
You can use what perl calls autovivification to make this quite easy. Your code doesn't need that central if-statement. You can boil it down to:
push @{ $hash{$q} }, $seq;
If the particular key doesn't yet exist in the hash, perl will autoviv it, since it can infer that you wanted an array reference here.
You can find further resources on autovivification by Googling it. It's a unique enough word that the vast majority of the hits seem relevant. :-)
You are actually pretty close, a few notes though:
In your else
block you assign a reference to @q
into your hash then immediately overwrite it with [$seq]
, only the last operation on the hash will hold
You don't need next
at the end of your loop, it will automatically go to the next iteration if there are no more statements to execute in the loop body.
Everything else seems to work fine, here are my revisions and the test data I used (since I don't know anything about DNA sequences I just used letters I remember from high school Biology)
Input file:
A 1
T 2
G 3
A 3
A 2
G 5
C 1
C 1
C 2
T 4
Code:
use strict;
use warnings FATAL => 'all';
# open file for reading
open(my $fh, '<', 'test.txt');
my %hash;
while ( my $line = <$fh> ) { # read a line
# split the line read from a file into a sequence name and value
my ($q, $seq) = split(/\s+/, $line);
if( exists $hash{$q} ) {
push @{ $hash{$q} }, $seq;
}
else {
$hash{$q} = [$seq];
}
}
# print the resulting hash
for my $k ( keys %hash ) {
print "$k : ", join(', ', @{$hash{$k}}), "\n";
}
# prints
# A : 1, 3, 2
# T : 2, 4
# C : 1, 1, 2
# G : 3, 5