Given the code:
$my_str = '
Rollo is*
My dog*
And he\'s very*
Lovely*
';
preg_match_all('/\S+(?=\*$)/m', $my_str, $end_words);
print_r($end_words);
In PHP 7.3.2 (XAMPP) I get the unexpected output
Array ( [0] => Array ( ) )
Whereas in PHPFiddle, on PHP 7.0.33, I get what I expected:
Array ( [0] => Array ( [0] => is [1] => dog [2] => very [3] => Lovely ) )
Can anyone tell me why I'm getting this difference, whether something changed in REGEX behaviour after 7.0.33?
It seems that in the environment you have, the PCRE library was compiled without the PCRE_NEWLINE_ANY
option, and $
in the multiline mode only matches before the LF symbol and .
matches any symbol but LF.
You can fix it by using the PCRE (*ANYCRLF)
verb:
'~(*ANYCRLF)\S+(?=\*$)~m'
(*ANYCRLF)
specifies a newline convention: (*CR)
, (*LF)
or (*CRLF)
and is equivalent to PCRE_NEWLINE_ANY
option. See the PCRE documentation:
PCRE_NEWLINE_ANY
specifies that any Unicode newline sequence should be recognized.
In the end, this PCRE verb enables .
to match any char BUT a CR and LF symbols and $
will match right before either of these two chars.
See more about this and other verbs at rexegg.com:
By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a .
(as the dot it doesn't match line breaks unless in dotall mode), as well the ^
and $
anchors' behavior in multiline mode. You can override this default with the following modifiers:
✽ (*CR)
Only a carriage return is considered to be a line break
✽ (*LF)
Only a line feed is considered to be a line break (as on Unix)
✽ (*CRLF)
Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
✽ (*ANYCRLF)
Any of the above three is considered to be a line break
✽ (*ANY)
Any Unicode newline sequence is considered to be a line break
For instance, (*CR)\w+.\w+
matches Line1\nLine2 because the dot is able to match the \n, which is not considered to be a line break. See demo.