I want to use preg_replace
to remove all unicode characters including Persian characters from a string and keep English and all special characters. The way I know to do it is :
preg_replace('/[^<>()/\* a-zA-Z0-9_.-]/u', '', $string);
But, I don't really want to include all special characters inside []. Is there any shorter way?!
To remove everything but characters falling in the basic ASCII range, you may use a pattern similar to this to match the range by HEX codes.
// Given a string with characters in and outside ASCII:
$s = "abcde啅cde衸xtzሴbb()*&bԴ";
// Match HEX 00-7F and remove characters outside that
// by inverting with ^
echo preg_replace('/[^\x00-\x7f]/', '', $s);
// Prints:
// abcdecdextzbb()*&b
Using HEX 00-7F will also include the start of the ASCII range, therefore covering things like NUL
, terminal bell, backspace, etc. You may consider starting with ASCII 32 (hex 20) at SPACE
if you don't want your output to include those special non-printable control characters.
echo preg_replace('/[^\x20-\x7f]/', '', $s);