Strip down everything, except alphanumeric and Eur

2020-07-17 15:51发布

问题:

I am working on validating my commenting script, and I need to strip down all non-alphanumeric chars except those used in Western Europe.

My plan is to regex out all non-alphanumeric characters with:

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

But that so far strips out all European characters and a £ sign, so "Café Rouge" becomes "Caf Rouge".

How can I add an array of Euro chars to the above regex.

The array is:

£, €, 
á, à, â, ä, æ, ã, å,
è, é, ê, ë,
î, ï, í, ì,
ô, ö, ò, ó, ø, õ,
û, ü, ù, ú,
ÿ,
ñ,
ß

I use UTF-8

SOLUTION:

$comment = preg_replace('/[^\p{Latin}\d\s\p{P}]/u', '', $comment);

and

$name = preg_replace('/[^\p{Latin}]/u', '', $name);

$name aslo removes punctuation marks and spaces

Thanks for quick replies

回答1:

preg_replace('/[^\p{Latin}\d ]/u', '', $str);


回答2:

echo preg_replace('/[^A-Z0-9 £€áàâä...]/ui', '', $string);

The important part is the /u flag. Make sure your source code and $string are UTF-8 encoded.

I still think it's the wrong approach, because it severely limits what your users can enter and it will annoy some, but whatever floats your boat... BTW, your list contains no punctuation characters.