Fixing Broken UTF8 encoding

2019-01-01 09:46发布

I am in the process of fixing some bad UTF8 encoding. I am currently using PHP 5 and MySQL

In my database I have a few instances of bad encodings that print like: î

  • The database collation is utf8_general_ci
  • PHP is using a proper UTF8 header
  • Notepad++ is set to use UTF8 without BOM
  • database management is handled in phpMyAdmin
  • not all cases of accented characters are broken

What I need is some sort of function that will help me map the instances of î, í, ü and others like it to their proper accented UTF8 characters.

12条回答
长期被迫恋爱
2楼-- · 2019-01-01 10:40

If you utf8_encode() on a string that is already UTF-8 then it looks garbled when it is encoded multiple times.

I made a function toUTF8() that converts strings into UTF-8.

You don't need to specify what the encoding of your strings is. It can be Latin1 (iso 8859-1), Windows-1252 or UTF8, or a mix of these three.

I used this myself on a feed with mixed encodings in the same string.

Usage:

$utf8_string = Encoding::toUTF8($mixed_string);

$latin1_string = Encoding::toLatin1($mixed_string);

My other function fixUTF8() fixes garbled UTF8 strings if they were encoded into UTF8 multiple times.

Usage:

$utf8_string = Encoding::fixUTF8($garbled_utf8_string);

Examples:

echo Encoding::fixUTF8("Fédération Camerounaise de Football");
echo Encoding::fixUTF8("Fédération Camerounaise de Football");
echo Encoding::fixUTF8("FÃÂédÃÂération Camerounaise de Football");
echo Encoding::fixUTF8("Fédération Camerounaise de Football");

will output:

Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football

Download:

https://github.com/neitanod/forceutf8

查看更多
谁念西风独自凉
3楼-- · 2019-01-01 10:44

I found a solution after days of search. My comment is going to be buried but anyway...

  1. I get the corrupted data with php.

  2. I don't use set names UTF8

  3. I use utf8_decode() on my data

  4. I update my database with my new decoded data, still not using set names UTF8

and voilà :)

查看更多
何处买醉
4楼-- · 2019-01-01 10:46

If you have double-encoded UTF8 characters (various smart quotes, dashes, apostrophe ’, quotation mark “, etc), in mysql you can dump the data, then read it back in to fix the broken encoding.

Like this:

mysqldump -h DB_HOST -u DB_USER -p DB_PASSWORD --opt --quote-names \
    --skip-set-charset --default-character-set=latin1 DB_NAME > DB_NAME-dump.sql

mysql -h DB_HOST -u DB_USER -p DB_PASSWORD \
    --default-character-set=utf8 DB_NAME < DB_NAME-dump.sql

This was a 100% fix for my double encoded UTF-8.

Source: http://blog.hno3.org/2010/04/22/fixing-double-encoded-utf-8-data-in-mysql/

查看更多
只靠听说
5楼-- · 2019-01-01 10:47

As Dan pointed out: you need to convert them to binary and then convert/correct the encoding.

E.g., for utf8 stored as latin1 the following SQL will fix it:

UPDATE table
   SET field = CONVERT( CAST(field AS BINARY) USING utf8)
 WHERE $broken_field_condition
查看更多
初与友歌
6楼-- · 2019-01-01 10:47

i had the same problem long time ago, and it fixed it using

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15">
查看更多
伤终究还是伤i
7楼-- · 2019-01-01 10:48

Another thing to check, which happened to be my solution (found here), is how data is being returned from your server. In my application, I'm using PDO to connect from PHP to MySQL. I needed to add a flag to the connection which said get the data back in UTF-8 format

The answer was

$dbHandle = new PDO("mysql:host=$dbHost;dbname=$dbName;charset=utf8", $dbUser, $dbPass, 
    array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"));
查看更多
登录 后发表回答