remove unicode characters but keep all special and

2019-07-21 00:48发布

问题:

I want to use preg_replace to remove all unicode characters including Persian characters from a string and keep English and all special characters. The way I know to do it is :

preg_replace('/[^<>()/\* a-zA-Z0-9_.-]/u', '', $string);

But, I don't really want to include all special characters inside []. Is there any shorter way?!

回答1:

To remove everything but characters falling in the basic ASCII range, you may use a pattern similar to this to match the range by HEX codes.

// Given a string with characters in and outside ASCII:
$s = "abcde啅cde衸xtzሴbb()*&bԴ";

// Match HEX 00-7F and remove characters outside that
// by inverting with ^
echo preg_replace('/[^\x00-\x7f]/', '', $s);
// Prints:
// abcdecdextzbb()*&b

Using HEX 00-7F will also include the start of the ASCII range, therefore covering things like NUL, terminal bell, backspace, etc. You may consider starting with ASCII 32 (hex 20) at SPACE if you don't want your output to include those special non-printable control characters.

echo preg_replace('/[^\x20-\x7f]/', '', $s);