How to remove diacritics from text?

2019-01-06 15:05发布

I am making a swedish website, and swedish letters are å, ä, and ö.

I need to make a string entered by a user to become url-safe with PHP.

Basically, need to convert all characters to underscore, all EXCEPT these:

 A-Z, a-z, 1-9

and all swedish should be converted like this:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

The rest should become underscores as I said.

Im not good at regular expressions so I would appreciate the help guys!

Thanks

NOTE: NOT URLENCODE...I need to store it in a database... etc etc, urlencode wont work for me.

9条回答
forever°为你锁心
2楼-- · 2019-01-06 15:40

You don't need fancy regexps to filter the swedish chars, just use the strtr function to "translate" them, like:

$your_URL = "www.mäåö.com";
$good_URL = strtr($your_URL, "äåöë etc...", "aaoe etc...");
echo $good_URL;

->output: www.maao.com :)

查看更多
Emotional °昔
3楼-- · 2019-01-06 15:41
// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);
查看更多
The star\"
4楼-- · 2019-01-06 15:42

Use iconv to convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:

$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

Result:

raksmorgas_och_kottbullar
查看更多
登录 后发表回答