-->

perl loops within subroutines to display the longe

2019-09-16 22:14发布

问题:

This question already has an answer here:

  • Find longest repeating string based on input in perl (using subroutines) 2 answers

I was wondering if anyone knows how to simplify, or generalize this code. It gives the correct answer, however it is only applicable to the current situation. My code is as follows:

sub longestRepeat{
                                # list of argument @_ is: (sequence, nucleotide)
  my $someSequence = shift(@_);  # shift off the first  argument from the list
  my $whatBP       = shift(@_);  # shift off the second argument from the list
  my $match = 0;



        if ($whatBP eq "AT"){
            if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) {

            $match = $1
            }
            return $match;

        }
        if ($whatBP eq "TAGA"){
            if ($someSequence =~ m/(([T][A][G][A])\2\2)/g) {

            $match = $1
            }
            return $match;
        }

        if ($whatBP eq "C"){
            if ($someSequence =~ m/(([C])\2\2)/g) {

            $match = $1
            }
            return $match;
        }
}   

My question is, in the second if statement, I have it set to a set amount of that pattern being repeated (applicable for the string we were given). However, is there a way to keep doing a while loop to search through the \2 (pattern repeat)? What I mean is can this: if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) be simplified and generalized with a while loop

回答1:

Based on the name of your subroutine, I'm assuming that you want to find the longest repeat sequence in your sequence.

If so, how about the following:

sub longest_repeat {

    my ( $sequence, $what ) = @_;

    my @matches = $sequence =~ /((?:$what)+)/g ;  # Store all matches

    my $longest;
    foreach my $match ( @matches ) {  # Could also avoid temp variable :
                                      # for my $match ( $sequence =~ /((?:$what)+)/g )

        $longest //= $match ;         # Initialize
                                      #  (could also do `$longest = $match
                                      #                    unless defined $match`)

        $longest = $match if length( $longest ) < length( $match );
    }

    return $longest;  # Note this also handles the case of no matches
}

If you can digest that, the following version achieves essentially the same functionality with a Schwartzian transform:

sub longest_repeat {

    my ( $sequence, $what ) = @_;                          # Example:
                                                           # --------------------
    my ( $longest ) = map { $_->[0] }                      # 'ATAT' ...
                        sort { $b->[1] <=> $a->[1] }       # ['ATAT',4], ['AT',2]
                          map { [ $_, length($_) ] }       # ['AT',2], ['ATAT',4]
                            $sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'

    return $longest ;
}

Some may argue that it is wasteful to sort because it is O(n.log(n)) instead of O(n) but there's variety for ya.