I need a function that removes all characters (not listed in pattern) from string but keeps foreign language letters. I know preg_replace has \p "pattern" but I can't get it working for some reason.
I use this function to remove all the crap from string:
$main_content=preg_replace("/[^a-zA-Z0-9`~!@#\$%\^&\*\(\)-_=\+\\|\,<\.>\/\?;:'\"\[\]\s]/", "", $main_content); //remove all symbols that do NOT match these
Put simply, the function should keep all the standard letters/numbers and standard symbols like +-!@#$ and so on, and remove all the crap like © ™ and so on. If there is a better way to write such preg_replace than I use, please let me know.
Now, I want the function to keep letters in foreign languages, so I modified it to
$main_content=preg_replace("/[^\p{L}a-zA-Z0-9`~!@#\$%\^&\*\(\)-_=\+\\|\,<\.>\/\?;:'\"\[\]\s]/", "", $main_content); //remove all symbols that do NOT match these
(You will notice \p{L} added). Unfortunately, it didn't work as expected. When I echo the text, I see that foreign languages were not removed (that's good) but they were converted into � (that's bad).
How do I fix it?
\p{L} is available only with u modifier:
Notice the u added after /