Counting number of occurrences of a string inside

2019-01-08 21:32发布

问题:

What is the fastest way to count the number of times a certain string appears in a bigger one? My best guess would be to replace all instances of that string with nothing, calculate the difference of lengths and divide by the length of the substring, but that seems rather slow, and I need to analyze big amounts of data.

回答1:

You can capture the strings, then count them. It can be done by applying a list context to the capture with ():

my $x = "foo";
my $y = "foo foo foo bar";
my $c = () = $y =~ /$x/g;  # $c is now 3

You can also capture to an array and count the array. Same principle, different technique:

my @c = $y =~ /$x/g;
my $count = @c;


回答2:

my $string = "aaaabbabbba";
my @count = ($string =~ /a/g);
print @count . "\n";

or

my $count = ($string =~ s/a/a/g);


回答3:

You could use a global regex. Something like:

my @matches = $bigstring =~ /($littlestring)/g;
my $count = @matches;


回答4:

Just for completeness you can repeatedly call the index function in a loop and count all the times it returned the index of the substring in the string, and change the starting position. That would avoid using regexes, and in my testing is a bit faster than the regex solutions.

I've adapted a sub to do that from here: http://www.misc-perl-info.com/perl-index.html

sub occurrences {

    my( $x, $y ) = @_;

    my $pos = 0;
    my $matches = 0;

    while (1) {
        $pos = index($y, $x, $pos);
        last if($pos < 0);
        $matches++;
        $pos++;
    }   

    return $matches;
}