PHP preg_replace with UTF-8 not working

2019-07-25 15:59发布

问题:

Why is this preg_replace not working?

FYI, I have the PHP script set to UTF8 Without BOM and I have the function here set to remove all matches of the pattern (instead of what I will actually do, which is remove all non-matches) because that is easier for testing. Note also that the character is not in my regex, so this should be the only character left behind.

$string='The Story of Jewād';
echo preg_replace('@([!"#$&’\(\)\*\+,\-\./0123456789:;<=>\?ABCDEFGHIJKLMNOPQRSTUVWXYZ\[\\\]\^_‘abcdefghijklmnopqrstuvwxyz\{\|\}~¡¢£⁄¥ƒ§¤“«‹›fifl–†‡·¶•‚„”»…‰¿`´ˆ˜¯˘˙¨˚¸˝˛ˇ—ƪŁØŒºæıłøœß÷¾¼¹×®Þ¦Ð½−çð±Çþ©¬²³™°µ ÁÂÄÀÅÃÉÊËÈÍÎÏÌÑÓÔÖÒÕŠÚÛÜÙÝŸŽáâäàåãéêëèíîïìñóôöòõšúûüùýÿž€\'])@u','',$string);

The result I get is $string unchanged. Why would this be?

回答1:

This works as reverse:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" >
<?php 

$string='The Story of Jewād';
echo preg_replace('@([ā])@','',$string);

?>

So, there is just a syntax problem somewhere ... This isn't a good idea to list all characters as a RegExp. You can do listings something like this:

ltrChars : 'A-Za-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF'+'\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF';
rtlChars : '\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC';