$_GET variable with messed up encoding

2019-02-18 10:49发布

问题:

I'm having a great deal of trouble with encoding in my site.

This is my problem right now, if I go to analize.php?dialog=árbol which code is:

<?
echo $_GET['dialog'];
echo "sabía";

on it I get:

sabía
sabía

I'm using ANSI, changing to UTF-8 breaks both. I don't understand why this happens, also there isn't any code above this. I don't care about how they display since this file is only used to fetch data from my database. But I need to make $_GET display properly so I can include it on the query.

How can this be done?

回答1:

You cannot send the character "í" in a URL, URLs must use a subset of the ASCII charset. Therefore the URL is encoded to ?dialog=sab%C3%ADa by your browser before being sent to the server. %C3%AD represents the two bytes C3 AD, which is the UTF-8 encoding for the character "í". You can confirm this with var_dump($_SERVER['QUERY_STRING']);. This is automatically decoded by PHP, the result is the UTF-8 byte sequence for "sabía" with the "í" being encoded using the two bytes C3 AD.

Your browser is interpreting this byte sequence using the Windows-1252 or ISO-8859-1 charset. The byte C3 represents "Ã" in this encoding, the byte AD represents a soft-hyphen and is invisible.

Two possible solutions:

  1. use UTF-8 everywhere (recommended!)

    • save your source code as UTF-8
    • output a header that forces the browser to interpret the site as UTF-8:

      header('Content-Type: text/html; charset=utf-8');
      
  2. convert the $_GET values to Windows-1252/ISO-8859-1 (or whatever encoding you want to use on your site) using mb_convert_encoding or iconv (not recommended)

    • even in this case you should set a header that announces to the browser what encoding exactly you're using

In short, you need to make sure you're using the same encoding everywhere and specify to the browser what encoding exactly that is.



标签: php encoding