I am trying to replace accented characters with the normal replacements. Below is what I am currently doing.
$string = "Éric Cantona";
$strict = strtolower($string);
echo "After Lower: ".$strict;
$patterns[0] = '/[á|â|à|å|ä]/';
$patterns[1] = '/[ð|é|ê|è|ë]/';
$patterns[2] = '/[í|î|ì|ï]/';
$patterns[3] = '/[ó|ô|ò|ø|õ|ö]/';
$patterns[4] = '/[ú|û|ù|ü]/';
$patterns[5] = '/æ/';
$patterns[6] = '/ç/';
$patterns[7] = '/ß/';
$replacements[0] = 'a';
$replacements[1] = 'e';
$replacements[2] = 'i';
$replacements[3] = 'o';
$replacements[4] = 'u';
$replacements[5] = 'ae';
$replacements[6] = 'c';
$replacements[7] = 'ss';
$strict = preg_replace($patterns, $replacements, $strict);
echo "Final: ".$strict;
This gives me:
After Lower: éric cantona
Final: ric cantona
The above gives me ric cantona
I want the output to be eric cantona
.
can anyone help me with where I am going wrong?
I know, that question has been asked a long long time ago...
I was looking for a short and elegant solution, but couldn't find satisfaction for two reasons:
First, most of the existing solutions replace a list of characters by a list of other characters. Unfortunately, it require to use a specific encoding for the php script file itself which might be unwanted.
Second, using iconv seems to be a good way, but it's not enough as the result of a converted character could be one or two characters, or a Fatal Exception.
So I wrote that small function which does the job :
So I found this on php.net page for preg_replace function
If you have encoding issues you may get someting like this "ZacarÃÂas FerreÃÂra", just decode the string and use said code above
You can take this as basis. From WordPress, used to generate pretty urls (the entry point is the slugify() function):
I have tried all sorts based on the variations listed in the answers, but the following worked:
In PHP 5.4 the
intl
extension provides a new class named Transliterator.I believe that's the best way to remove diacritics for two reasons:
Transliterator is based on ICU, so you're using the tables of the ICU library. ICU is a great project, developed over the year to provide comprehensive tables and functionalities. Whatever table you want to write yourself, it will never be as complete as the one from ICU.
In UTF-8, characters could be represented differently. For example, the character ñ could be saved as a single (multi-byte) character, or as the combination of characters
˜
(multibyte) andn
. In addition to this, some characters in Unicode are homograph: they look the same while having different codepoints. For this reason it's also important to normalize the string.Here's a sample code, taken from an old answer of mine:
Result:
The first argument for the Transliterator class performs the removal of diacritics as well as the normalization of the string.
if you have http://php.net/manual/en/book.intl.php available, this will solve your problem: