I am working on validating my commenting script, and I need to strip down all non-alphanumeric chars except those used in Western Europe.
My plan is to regex out all non-alphanumeric characters with:
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
But that so far strips out all European characters and a £ sign, so "Café Rouge" becomes "Caf Rouge".
How can I add an array of Euro chars to the above regex.
The array is:
£, €,
á, à, â, ä, æ, ã, å,
è, é, ê, ë,
î, ï, í, ì,
ô, ö, ò, ó, ø, õ,
û, ü, ù, ú,
ÿ,
ñ,
ß
I use UTF-8
SOLUTION:
$comment = preg_replace('/[^\p{Latin}\d\s\p{P}]/u', '', $comment);
and
$name = preg_replace('/[^\p{Latin}]/u', '', $name);
$name aslo removes punctuation marks and spaces
Thanks for quick replies
The important part is the
/u
flag. Make sure your source code and$string
are UTF-8 encoded.I still think it's the wrong approach, because it severely limits what your users can enter and it will annoy some, but whatever floats your boat... BTW, your list contains no punctuation characters.