Generating sets of array in perl

2019-09-21 23:13发布

Given perl script cut the input sequence at "E" and skips those particular positions of "E" which is mentioned in @nobreak, and generates an array of fragments as an output. But I want a script which generates set of such array in output for every position which has been skipped taking all positions of @nobreak into account. say set 1 contains fragments resulted after skipping at "E" 37, set 2 after skipping at "E" 45, and so on. Below mentioned script which I wrote is not working correctly. I want to generate 4 different array in output taking one position of @nobreak at a time. Please help!

my $s = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN';

print "Results of 1-Missed Cleavage:\n\n";

my @nobreak = (37, 45, 57, 59);
{
    @nobreak = map { $_ - 1 } @nobreak;

    foreach (@nobreak) {

        substr($s, $_, 1) = "\0";
    } 
    my @a   = split /E(?!P)/, $s;
    $_      =~ s/\0/E/g foreach (@a);
    $result = join "E,", @a; 
    @final  = split /,/, $result;
    print "@final\n";
}

3条回答
【Aperson】
2楼-- · 2019-09-21 23:15

It looks like you want to split the string after all E characters, but not before any P character

This code will do what you want. It works by changing the E at each offset in @nobreak to an e (much better than "\0" for debugging) and splitting on /(?<=E)(?!P)/ - i.e. after an E but not before a P. The e is changed back to an E afterwards using tr/e/E/

use strict;
use warnings;

my $s = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN';

print "Results of 1-Missed Cleavage:\n\n";

my @nobreak = (37, 45, 57, 59);

for my $index (@nobreak) {
  my $ss = $s;
  substr($ss, $index-1, 1) = 'e';
  my @final = split /(?<=E)(?!P)/, $ss;
  tr/e/E/ for @final;
  print "$_\n" for @final;
  print "\n";
}

output

Results of 1-Missed Cleavage:

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGE
RGFFYTPKTRRE
AE
DLQVGQVE
LGGGPGAGSLQPLALE
GSLQKRGIVE
QCCTSICSLYQLE
NYCN

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVE
ALYLVCGERGFFYTPKTRRE
AE
DLQVGQVE
LGGGPGAGSLQPLALE
GSLQKRGIVE
QCCTSICSLYQLE
NYCN

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVE
ALYLVCGE
RGFFYTPKTRREAE
DLQVGQVE
LGGGPGAGSLQPLALE
GSLQKRGIVE
QCCTSICSLYQLE
NYCN

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVE
ALYLVCGE
RGFFYTPKTRRE
AEDLQVGQVE
LGGGPGAGSLQPLALE
GSLQKRGIVE
QCCTSICSLYQLE
NYCN
查看更多
贼婆χ
3楼-- · 2019-09-21 23:32

Loop over @nobreak?

my $s = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN';
print "Results of 1-Missed Cleavage:\n\n";
my @nobreak = (37,45,57,59);
for my $nobreak (@nobreak) {
    substr($s, $nobreak-1, 1) = "\0";
    my @a = split(/E(?!P)/, $s);
    substr($s, $nobreak-1, 1) = 'E';
    $_ =~ s/\0/E/g foreach (@a);
    $result = join ("E,", @a); 
    @final = split(/,/, $result);
    print "@final\n";
}
查看更多
\"骚年 ilove
4楼-- · 2019-09-21 23:32

To split the string at every 'E' without consuming it in the process, use a lookbehind:

my @final = split /(?<=E)/, $str;

To assert finer control over which 'E' to split on (which you left unspecified), the change would be made to the regex.

In case a variable lookbehind is needed, one could use \K...

查看更多
登录 后发表回答