I'm representing nucleotides A,C,G,T as 0,1,2,3, and afterwards I need to translate the sequence representing as quaternary to decimal. Is there a way to achieve this in perl? I'm not sure if pack/unpack can do this or not.
问题:
回答1:
Base 4 requires exactly 2 bits, so it's easy to handle efficiently.
my $uvsize = length(pack('J>', 0)) * 8;
my %base4to2 = map { $_ => sprintf('%2b', $_) } 0..3;
sub base4to10 {
my ($s) = @_;
$s =~ s/(.)/$base4to2{$1}/sg;
$s = substr(("0" x $uvsize) . $s, -$uvsize);
return unpack('J>', pack('B*', $s));
}
This allows inputs of 16 digits on builds supporting 32-bit integers, and 32 digits on builds supporting 64-bit integers.
It's possible to support slightly larger numbers using floating points: 26 on builds with IEEE doubles, 56 on builds with IEEE quads. This would require a different implementation.
Larger than that would require a module such as Math::BigInt for Perl to store them.
Faster and simpler:
my %base4to16 = (
'0' => '0', '00' => '0', '20' => '8',
'1' => '1', '01' => '1', '21' => '9',
'2' => '2', '02' => '2', '22' => 'A',
'3' => '3', '03' => '3', '23' => 'B',
'10' => '4', '30' => 'C',
'11' => '5', '31' => 'D',
'12' => '6', '32' => 'E',
'13' => '7', '33' => 'F',
);
sub base4to10 {
(my $s = $_[0]) =~ s/(..?)/$base4to16{$1}/sg;
return hex($s);
}
回答2:
I've never used it, but it looks like the Convert::BaseN module would be a good choice. Convert::BaseN - encoding and decoding of base{2,4,8,16,32,64} strings
回答3:
It is very simple to calculate a base-4 string to decimal by processing each digit in a loop
Note that, on 32-bit machines, you won't be able to represent a sequence longer than sixteen bases
This code shows the idea
use strict;
use warnings;
print seq2dec('ACGTACGTACGTACGT');
sub seq2dec{
my ($sequence) = @_;
my $n = 0;
for (map {index 'ACGT', $_} split //, $sequence) {
$n = $n * 4 + $_;
}
return $n;
}
output
454761243