I'm having a great deal of trouble with encoding in my site.
This is my problem right now, if I go to analize.php?dialog=árbol
which code is:
<?
echo $_GET['dialog'];
echo "sabía";
on it I get:
sabÃa
sabía
I'm using ANSI, changing to UTF-8 breaks both. I don't understand why this happens, also there isn't any code above this. I don't care about how they display since this file is only used to fetch data from my database. But I need to make $_GET
display properly so I can include it on the query.
How can this be done?
You cannot send the character "í" in a URL, URLs must use a subset of the ASCII charset. Therefore the URL is encoded to
?dialog=sab%C3%ADa
by your browser before being sent to the server.%C3%AD
represents the two bytesC3 AD
, which is the UTF-8 encoding for the character "í". You can confirm this withvar_dump($_SERVER['QUERY_STRING']);
. This is automatically decoded by PHP, the result is the UTF-8 byte sequence for "sabía" with the "í" being encoded using the two bytesC3 AD
.Your browser is interpreting this byte sequence using the Windows-1252 or ISO-8859-1 charset. The byte
C3
represents "Ã" in this encoding, the byteAD
represents a soft-hyphen and is invisible.Two possible solutions:
use UTF-8 everywhere (recommended!)
output a header that forces the browser to interpret the site as UTF-8:
convert the
$_GET
values to Windows-1252/ISO-8859-1 (or whatever encoding you want to use on your site) usingmb_convert_encoding
oriconv
(not recommended)In short, you need to make sure you're using the same encoding everywhere and specify to the browser what encoding exactly that is.