There are no multibyte 'preg' functions available in PHP, so does that mean the default preg_functions are all mb safe? Couldn't find any mention in the php documentation.
相关问题
- Views base64 encoded blob in HTML with PHP
- Laravel Option Select - Default Issue
- PHP Recursively File Folder Scan Sorted by Modific
- Can php detect if javascript is on or not?
- Using similar_text and strpos together
No, they are not. See the question preg_match and UTF-8 in PHP for example.
pcre supports utf8 out of the box, see documentation for the 'u' modifier.
Illustration (\xC3\xA4 is the utf8 encoding for the german letter "ä")
this echoes "@@¤@" because "\xC3" and "\xA4" were treated as distinct symbols
(note the 'u') prints "@@@" because "\xC3\xA4" were treated as a single letter.
PCRE can support UTF-8 and other Unicode encodings, but it has to be specified at compile time. From the man page for PCRE 8.0:
PHP currently uses PCRE 7.9; your system might have an older version.
Taking a look at the PCRE lib that comes with PHP 5.2, it appears that it's configured to support Unicode properties and UTF-8. Same for the 5.3 branch.
No, you need to use the multibyte string functions like
mb_ereg
Some of my more complicated preg functions:
(1a) validate username as alphanumeric + underscore:
(1b) possible UTF alternative:
(2a) validate email:
(2b) possible UTF alternative:
(3a) normalize newlines:
(3b) possible UTF alternative:
Do thse changes look alright?