PHP: preg_match_all() - how to find all occurrence

2019-08-26 04:48发布

问题:

My task is to find all consecutive number in a string of only numbers. However I am not searching for a better regex to do this, but for a correct regex of matching substrings.

This is how I build my regex:

$regex = "";

for($i=0;$i<10;$i++) {
    $str = "";
    for($a=0;$a<10;$a++) {
        if($a > $i) {
            $str .= $a;
            if(strlen($str)>1) {
              $regex .= "|".$str."";
            }
        }
    }
}

$myregex = "/".ltrim($regex,"|")."/";
echo $myregex;

Result:

/12|123|1234|12345|123456|1234567|12345678|123456789|23|234|2345|23456|234567|2345678|23456789|34|345|3456|34567|345678|3456789|45|456|4567|45678|456789|56|567|5678|56789|67|678|6789|78|789|89/

Then I do:

$literal = '234121678941251236544567812122345678';
$matches = [];
preg_match_all($myregex,$literal,$matches);
var_dump($matches);

Result:

array(1) {
  [0]=>
  array(13) {
    [0]=>
    string(2) "23"
    [1]=>
    string(2) "12"
    [2]=>
    string(2) "67"
    [3]=>
    string(2) "89"
    [4]=>
    string(2) "12"
    [5]=>
    string(2) "12"
    [6]=>
    string(2) "45"
    [7]=>
    string(2) "67"
    [8]=>
    string(2) "12"
    [9]=>
    string(2) "12"
    [10]=>
    string(2) "23"
    [11]=>
    string(2) "45"
    [12]=>
    string(2) "67"
  }
}

However I want to find all substrings occuring (and not go to the next chars after a match) - like:

23,234,34,12,67,678,6789,78,789,89,12, ...

However I have tried different variatons with brackets, +, ... and did not figure out the correct regex to find all matches (sorry, still bit of a regex noob). How do I have to modify the regular expression?

回答1:

The order of the regex is important. I'm not sure if this fully solves the issue the method of doing it this way may be fundamentally flawed but you can try this:

$regex = [];

for($i=0;$i<10;$i++) {
    $str = "";
    for($a=0;$a<10;$a++) {
        if($a > $i) {
            $str .= $a;
            if(strlen($str)>1) {
              $regex[] = $str;
            }
        }
    }
}

usort($regex, function($a,$b){
    return strlen($b) <=> strlen($a);
});

$myregex = '/'.implode('|', $regex).'/';

What it does is make the number sequences an array, then it sorts them by length and orders them the longest sequences first. The end result is this (after matching)

array(1) {
  [0]=>
  array(9) {
    [0]=>
    string(3) "234"
    [1]=>
    string(2) "12"
    [2]=>
    string(4) "6789"
    [3]=>
    string(2) "12"
    [4]=>
    string(3) "123"
    [5]=>
    string(5) "45678"
    [6]=>
    string(2) "12"
    [7]=>
    string(2) "12"
    [8]=>
    string(7) "2345678"
  }
}

Also note the spaceship operator <=> only works in PHP7+

Hope it helps.

Sandbox

and not go to the next chars after a match

I don't think this is possible with regex, if you mean you want to find 23 234 2345 all at once in 2345607 for example. However if it matches a long sequence it only stands to reason that it must match a shorter one, logically. So you could just trim off the right hand number until the length is 2 and get the matches.