I'm trying to replace the special characters in a PHP string with normal characters (as in replace ó with o and á with a). I tried using the PHP Normalizer::normalize function as in the following code:
if (!Normalizer::isNormalized($word, Normalizer::FORM_C))
{
echo "original: ".$word;
$word = Normalizer::normalize($word, Normalizer::FORM_C);
echo "\tnormalized: ".$word."<br />";
exit; // see if it worked without having to go through every file
}
However, Normalizer::normalize returned null and the output from that code was:
original: adiós normalized:
Since this method didn't seem to be working, I went and found a function that was supposed to remove special characters. Here is the function:
function normalize ($string) {
$table = array(
'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r',
);
return strtr($string, $table);
}
This code had no noticeable effect, however, and returned the same string that was passed in.
I'm obtaining my strings from *.txt files in Windows 7. I've never been very good at encodings, and would appreciate any help on this issue.
I copied and pasted your code into my editor and something interesting happened. Instead of getting
adios
I was gettingadjiós
. Notice thej
in the middle after the d. This was coming from the'đ'=>'dj',
in the first line of the table map. Apparently, my editor changed theđ
to a regulard
, and then it wouldn't convert theó
. I removed this key/value pair and suddenly it worked for me. Are you sure all of your keys are correct in your editor (Does you editor accept alternative character sets?) Here is my test file (with theđ
removed:When I loop through each character with the
'd' => 'dj'
in the array map then I correctly getadjios
There's a great tip from this page: How to remove diacritics from text? Here's my version of it:
It's good because, unlike the iconv method mentioned above, there's no converting between character sets (they're a minefield).