preg_replace + UTF-8 doesn't work on one serve

2019-06-21 16:21发布

问题:

echo preg_match("/\b(בדיקה|מילה)\b/iu", "זוהי בדיקה");

For some reason, this code returns 1 on several servers I've tested it on, but 0 on one specific server.

PCRE is compiled with UTF-8 support and Unicode properties support. What could be the issue?

回答1:

There may be difference between PCRE versions which PHP use.

PHP and PCRE versions: http://php.net/pcre.installation

You should use 8.10+ (PHP 5.3.4+)

Version 8.10 25-Jun-2010:

  1. Added PCRE_UCP to make \b, \d, \s, \w, and certain POSIX character classes use Unicode properties. (*UCP) at the start of a pattern can be used to set this option. Modified pcretest to add /W to test this facility. Added REG_UCP to make it available via the POSIX interface.

Edit: Just done some tests and it gives 1 on PHP 5.3.10 and 0 on PHP 5.3.2 and PHP 5.3.3.



回答2:

It might depend on version of PCRE lib. To make things more normalized, try using the «UCP verb»: preg_match('/(*UCP)\b(בדיקה|מילה)\b/iu', 'זוהי בדיקה').

Still it requires PCRE v8.10, shipped with PHP since 5.3.4 or when mentioned in a compile flag --with-pcre-regex=DIR.

Ref (in russian)