matching finite natural number series

2020-06-21 19:03发布

问题:

How can I match finite natural number series with regex?

So, requirements are:

  • string contains numbers and spaces (as delimiters)
  • first number is 1
  • each number (except the first one) is equal to previous number + 1

Should be matched:

  • 1
  • 1 2
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
  • long series of consequent numbers from 1 to 10^1000

Should not be matched:

  • ``
  • 1 3 4
  • 1 2 3 4 5 6 6

Beside that there are some requirements to regex:

  • it should be single one-shot expression, and not a cycle-condition-algorithm bale of instructions
  • it could use all power of perl regular expressions

I'm not sure that regex are actually lazy, so it would be great if they are. Because the natural number series is non-finite in it's original meaning from number theory.

And the last one. Please notice, that I'm not using the wrong tool for that job. It's not a real world programming task at all.

回答1:

Here you go. Tested on Perl v5.10 through v5.14. The key is the recursive pattern, where we recurse on the (?&Sequence) rule. It’s something of a proof by induction.

The bigint is there just in case you really want to generate a sequence from 1 .. 10**10_000. It will run considerably faster if you can limit yourself to machine native ints, 32-bit or 64-bit depending on your platform.

#!/usr/bin/env perl
use v5.10;
use bigint;  # only if you need stuff over maxint

my $pat = qr{
    ^
    (?= 1 \b )
    (?<Sequence>
        (?<Number> \d+ )
        (?:
            \s+
            (??{  "(?=" . (1 + $+{Number}) . ")" })
            (?&Sequence)
        )?
    )
    $
}x;

# first test embedded data
while (<DATA>) {
    if ( /$pat/ ) {
        print "PASS: ", $_;

    } else {
        print "FAIL: ", $_;
    }
}

# now generate long sequences
for my $big ( 2, 10, 25, 100, 1000, 10_000, 100_000 ) {
    my $str = q();
    for (my $i = 1; $i <= $big; $i++) {
        $str .= "$i ";
    }
    chop $str;
    if ($str =~ $pat) {
        print "PASS: ";
    } else {
        print "FAIL: ";
    }
    if (length($str) > 60) {
        my $len = length($str);
        my $first = substr($str,   0, 10);
        my $last  = substr($str, -10);
        $str = $first . "[$len chars]" . $last;
    }
    say $str;

}


__END__
5
fred
1
1 2 3
1 3 2
1 2 3 4 5
1 2 3 4 6
2 3 4 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 2 3 4 5 6 6

Which run produces:

FAIL: 5
FAIL: fred
PASS: 1
PASS: 1 2 3
FAIL: 1 3 2
PASS: 1 2 3 4 5
FAIL: 1 2 3 4 6
FAIL: 2 3 4 6
PASS: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
FAIL: 1 2 3 4 5 6 6
PASS: 1 2
PASS: 1 2 3 4 5 6 7 8 9 10
PASS: 1 2 3 4 5 [65 chars]2 23 24 25
PASS: 1 2 3 4 5 [291 chars] 98 99 100
PASS: 1 2 3 4 5 [3892 chars]8 999 1000
PASS: 1 2 3 4 5 [588894 chars]999 100000

At the risk of seeming self-serving, there is a book that covers this sort of thing. See the section on “Fancy Patterns” in Chapter 5 of Programming Perl, 4ᵗʰ edition. You’ll want to check out the new sections on “Named Groups”, on “Recursive Patterns”, and on “Grammatical Patterns”. The book is at the printers, and should be available electronically in a day or two.



回答2:

Try next regex (in perl):

m/\A((??{ our $i += 1 })(?>\s*))+\Z/

Test:

Content of script.pl:

use warnings;
use strict;

while ( <DATA> ) { 
    chomp;
    our $i = 0;
    printf qq[%s\n], $_ if m/\A((??{ our $i += 1 })(?>\s*))+\Z/;
}

__DATA__
0
2
1
1 3 4
1 2
1 2 3 4 5 6 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
2 1
1 2 3 4 5 7
1           2            3    

Run the script:

perl script.pl

And result:

1
1 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1           2            3 


回答3:

i don't think there is a possible pattern to fullfil your requirements, because regexes principally match on text; there is no calculation while matching

however you could build your own automaton that performs calculation, or you simply iterate over the numbers, which should be way more efficient



标签: regex perl