i have this werid problem. After a preg_replace, some chinese character became funky character. this is the script.
$message = strip_tags(mysql_real_escape_string($_POST['message']),'<img><vid>');
echo $message;
$message = removewhitespace($message);
echo $message;
function removewhitespace($a)
{
return preg_replace('/(\\\r\\\n\\\r\\\n)+/','\r\n\r\n', preg_replace('/^(\\\r\\\n)+|(\\\r\\\n)+$/', '', preg_replace('/\s+/', ' ', preg_replace('/^\s+|\s+$/', '', $a))));
}
The display would be
好不好你
好不好�
Any ideas?
Add the
'u'
modifier to your patterns (e.g.'/(\\\r\\\n\\\r\\\n)+/u'
instead of'/(\\\r\\\n\\\r\\\n)+/'
) and make sure the subject is in UTF-8.Only this way will your input be interpreted as UTF-8 instead of a single-byte encoding.
Use
\p{Z}
instead of\s
in your regexUnicode characters take up multiple bytes whereas ASCII characters take up one. You probably need to do a multibyte search
mb_ereg_replace
http://us2.php.net/manual/en/function.mb-ereg-replace.php