I have the following problem: I have a php page which parses an XML file. The php gets an ID, then queries the database to get the information from that ID and outputs this info as XML. As it happens, there are special characters in the DB, specifically [é, è, ö, ä, ü.. etc ]. The file is saved as UTF-8 (I tried saving as UTF-16, but the output looked horrible). The php file looks like this:
<?php
header('Content-Type: text/xml');
echo '<?xml version="1.0" encoding="UTF-16" standalone="yes" ?>';
include ('config.php');
echo '<response>';
$id = $_GET["id"];
// configure server request
$db_q1 = "SELECT * from table WHERE id = '" . $id . "'";
$db_r1 = mysql_query($db_q1);
while($rec = mysql_fetch_assoc($db_r1)) {
$string = $rec["name"];
...
}
echo '<name>' . $string . '</name>';
echo '<response>';
?>
This file works well with any db-entry that has no special characters, but when it does, I get the following error: Encoding error. This happens precisely at the point of the special character, i.e., the xml is parsed up to there. After researching the error, I figured it must be due to the special characters (also because the xml was parsed till there and worked for other entries).
I have tried the following, based on research here and elsewhere and after reading the php manual of certain techniques:
$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string); --> works, but cuts off string from special character ("Brötchen" --> "Br").
$string = str_replace(search, replace, $string); --> Has simply no effect. I get the same error.
- Using a function which will return an altered string, e.g. from here, which also did not have any effect.
- htmlentities($string) --> I get the following error: "Entity 'uuml' not defined".
- htmlspecialchars() returns the original error.
- urlencode() is not useful, because it treats the string as url and adds "+".
the following function (unfortunately, I don't remember where I got it, otherwise I would credit the person who wrote it, as it seems to be useful):
function remove_accents($str) { $from = array( "á", "à", "â", "ã", "ä", "é", "è", "ê", "ë", "í", "ì", "î", "ï", "ó", "ò", "ô", "õ", "ö", "ú", "ù", "û", "ü", "ç", "Á", "À", "Â", "Ã", "Ä", "É", "È", "Ê", "Ë", "Í", "Ì", "Î", "Ï", "Ó", "Ò", "Ô", "Õ", "Ö", "Ú", "Ù", "Û", "Ü", "Ç" ); $to = array( "a", "a", "a", "a", "a", "e", "e", "e", "e", "i", "i", "i", "i", "o", "o", "o", "o", "o", "u", "u", "u", "u", "c", "A", "A", "A", "A", "A", "E", "E", "E", "E", "I", "I", "I", "I", "O", "O", "O", "O", "O", "U", "U", "U", "U", "C" ); return str_replace($from, $to, $str); } $string = remove_accents($string);
I really don't know what I'm doing wrong and why so many different methods fail at accomplishing this task. Help is much appreciated!
This is a function from Wordpress core which removes accents from letters, it should work:
For others who experience similar problems; I also had problems with string replace on variables that came in via POST.
This worked for me:
utf8_encode() did the trick!
I needed only lower case, if you need upper case you will have to add all the uppercase variants for these special characters.