How can I match finite natural number series with regex?
So, requirements are:
- string contains numbers and spaces (as delimiters)
- first number is
1
- each number (except the first one) is equal to previous number + 1
Should be matched:
1
1 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
- long series of consequent numbers from 1 to 10^1000
Should not be matched:
Beside that there are some requirements to regex:
- it should be single one-shot expression, and not a cycle-condition-algorithm bale of instructions
- it could use all power of
perl
regular expressions
I'm not sure that regex are actually lazy, so it would be great if they are. Because the natural number series is non-finite in it's original meaning from number theory.
And the last one. Please notice, that I'm not using the wrong tool for that job. It's not a real world programming task at all.
Here you go. Tested on Perl v5.10 through v5.14. The key is the recursive pattern, where we recurse on the (?&Sequence)
rule. It’s something of a proof by induction.
The bigint
is there just in case you really want to generate a sequence from 1 .. 10**10_000
. It will run considerably faster if you can limit yourself to machine native ints, 32-bit or 64-bit depending on your platform.
#!/usr/bin/env perl
use v5.10;
use bigint; # only if you need stuff over maxint
my $pat = qr{
^
(?= 1 \b )
(?<Sequence>
(?<Number> \d+ )
(?:
\s+
(??{ "(?=" . (1 + $+{Number}) . ")" })
(?&Sequence)
)?
)
$
}x;
# first test embedded data
while (<DATA>) {
if ( /$pat/ ) {
print "PASS: ", $_;
} else {
print "FAIL: ", $_;
}
}
# now generate long sequences
for my $big ( 2, 10, 25, 100, 1000, 10_000, 100_000 ) {
my $str = q();
for (my $i = 1; $i <= $big; $i++) {
$str .= "$i ";
}
chop $str;
if ($str =~ $pat) {
print "PASS: ";
} else {
print "FAIL: ";
}
if (length($str) > 60) {
my $len = length($str);
my $first = substr($str, 0, 10);
my $last = substr($str, -10);
$str = $first . "[$len chars]" . $last;
}
say $str;
}
__END__
5
fred
1
1 2 3
1 3 2
1 2 3 4 5
1 2 3 4 6
2 3 4 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 2 3 4 5 6 6
Which run produces:
FAIL: 5
FAIL: fred
PASS: 1
PASS: 1 2 3
FAIL: 1 3 2
PASS: 1 2 3 4 5
FAIL: 1 2 3 4 6
FAIL: 2 3 4 6
PASS: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
FAIL: 1 2 3 4 5 6 6
PASS: 1 2
PASS: 1 2 3 4 5 6 7 8 9 10
PASS: 1 2 3 4 5 [65 chars]2 23 24 25
PASS: 1 2 3 4 5 [291 chars] 98 99 100
PASS: 1 2 3 4 5 [3892 chars]8 999 1000
PASS: 1 2 3 4 5 [588894 chars]999 100000
At the risk of seeming self-serving, there is a book that covers this sort of thing. See the section on “Fancy Patterns” in Chapter 5 of Programming Perl, 4ᵗʰ edition. You’ll want to check out the new sections on “Named Groups”, on “Recursive Patterns”, and on “Grammatical Patterns”. The book is at the printers, and should be available electronically in a day or two.
Try next regex (in perl
):
m/\A((??{ our $i += 1 })(?>\s*))+\Z/
Test:
Content of script.pl
:
use warnings;
use strict;
while ( <DATA> ) {
chomp;
our $i = 0;
printf qq[%s\n], $_ if m/\A((??{ our $i += 1 })(?>\s*))+\Z/;
}
__DATA__
0
2
1
1 3 4
1 2
1 2 3 4 5 6 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
2 1
1 2 3 4 5 7
1 2 3
Run the script:
perl script.pl
And result:
1
1 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 2 3
i don't think there is a possible pattern to fullfil your requirements, because regexes principally match on text; there is no calculation while matching
however you could build your own automaton that performs calculation, or you simply iterate over the numbers, which should be way more efficient