Weird problem with preg_replace and chinese charac

2019-07-19 06:45发布

i have this werid problem. After a preg_replace, some chinese character became funky character. this is the script.

$message = strip_tags(mysql_real_escape_string($_POST['message']),'<img><vid>');
echo $message;
$message = removewhitespace($message);
echo $message;

function removewhitespace($a)
{
return preg_replace('/(\\\r\\\n\\\r\\\n)+/','\r\n\r\n', preg_replace('/^(\\\r\\\n)+|(\\\r\\\n)+$/', '', preg_replace('/\s+/', ' ', preg_replace('/^\s+|\s+$/', '', $a))));
}

The display would be

好不好你
好不好�

Any ideas?

3条回答
Explosion°爆炸
2楼-- · 2019-07-19 07:15

Add the 'u' modifier to your patterns (e.g. '/(\\\r\\\n\\\r\\\n)+/u' instead of '/(\\\r\\\n\\\r\\\n)+/') and make sure the subject is in UTF-8.

Only this way will your input be interpreted as UTF-8 instead of a single-byte encoding.

查看更多
叼着烟拽天下
3楼-- · 2019-07-19 07:22

Use \p{Z} instead of \s in your regex

查看更多
爱情/是我丢掉的垃圾
4楼-- · 2019-07-19 07:35

Unicode characters take up multiple bytes whereas ASCII characters take up one. You probably need to do a multibyte search mb_ereg_replace http://us2.php.net/manual/en/function.mb-ereg-replace.php

查看更多
登录 后发表回答