I'm having a great deal of trouble with encoding in my site.
This is my problem right now, if I go to analize.php?dialog=árbol
which code is:
<?
echo $_GET['dialog'];
echo "sabía";
on it I get:
sabÃa
sabía
I'm using ANSI, changing to UTF-8 breaks both. I don't understand why this happens, also there isn't any code above this. I don't care about how they display since this file is only used to fetch data from my database. But I need to make $_GET
display properly so I can include it on the query.
How can this be done?
You cannot send the character "í" in a URL, URLs must use a subset of the ASCII charset. Therefore the URL is encoded to ?dialog=sab%C3%ADa
by your browser before being sent to the server. %C3%AD
represents the two bytes C3 AD
, which is the UTF-8 encoding for the character "í". You can confirm this with var_dump($_SERVER['QUERY_STRING']);
. This is automatically decoded by PHP, the result is the UTF-8 byte sequence for "sabía" with the "í" being encoded using the two bytes C3 AD
.
Your browser is interpreting this byte sequence using the Windows-1252 or ISO-8859-1 charset. The byte C3
represents "Ã" in this encoding, the byte AD
represents a soft-hyphen and is invisible.
Two possible solutions:
use UTF-8 everywhere (recommended!)
convert the $_GET
values to Windows-1252/ISO-8859-1 (or whatever encoding you want to use on your site) using mb_convert_encoding
or iconv
(not recommended)
- even in this case you should set a header that announces to the browser what encoding exactly you're using
In short, you need to make sure you're using the same encoding everywhere and specify to the browser what encoding exactly that is.