PHP - support for multibyte safe regular expressio

2019-05-11 18:52发布

PHP supports regular expressions in three ways:

Today the web is Unicode, and PHP is too since 5.6 because of i18n. While PHP itself is known to be abysmally bad in supporting Unicode, Intl provides access to the relieving ICU library.

To avoid the long wait for UString and repetition (and memory) when doin' it right, I prefer Intl and leave out iconv, Multibyte String along with DateTime, and rewrite most of the SBCS string functions to be multibyte safe. In that process some issues arise:

To use PCRE with Unicode syntax, PHP's buit-in PCRE has to be compiled and configured with Unicode support. On some systems it is not configured with Unicode, adding (*UTF8) before the expression overrides configuration.

  • have I missed a way to work with ICU's regular expression functions from PHP?
  • are there any other pitfalls to take into account for Unicode PCRE?
  • have I missed a reason why Multibyte String should be used?

0条回答
登录 后发表回答