I've got an issue with inserting/reading utf8 content from a db. All verifications I'm doing seem to point to the fact that the content in my DB should be utf8 encoded, however it seems to be latin encoded. The data are initially imported from a PHP script from the CLI.
Configuration:
Zend Framework Version: 1.10.5
mysql-server-5.0: 5.0.51a-3ubuntu5.7
php5-mysql: 5.2.4-2ubuntu5.10
apache2: 2.2.8-1ubuntu0.16
libapache2-mod-php5: 5.2.4-2ubuntu5.10
Vertifications:
-mysql:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_bin |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
-database
created with
CREATE DATABASE mydb CHARACTER SET utf8 COLLATE utf8_bin;
CREATE SCHEMA `mydb` DEFAULT CHARACTER SET utf8 COLLATE utf8_bin ;
mysql> status;
--------------
mysql Ver 14.12 Distrib 5.0.51a, for debian-linux-gnu (i486) using readline 5.2
Connection id: 7
Current database: mydb
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.0.51a-3ubuntu5.7-log (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 9 min 45 sec
-sql: before doing my inserts I run the
SET names 'utf8';
-php: before doing my inserts I use utf8_encode() and mb_detect_encoding() which gives me 'UTF-8'. After retrieveing the content from db and before sending it to the user mb_detect_encoding() also gives 'UTF-8'
Validation test:
the only way for me to have the content displayed properly is to set the content type to latin (If I sniff the traffic I can see the content-type header with ISO-8859-1):
ini_set('default_charset', 'ISO-8859-1');
This test shows that the content comes out as latin. I don't understand why. Does anybody have any idea?
Thanks.
SHOW FULL COLUMNS FROM table;
show? Having a table with a default charset does not mean the column is. i.e, this is valid:.
Well, I've found that
SET NAMES
isn't really all that great. Take a peak at the docs...What I typically do is execute 4 queries:
Give that a shot and see if that does it for you...
Oh, and remember, all UTF-8 characters <= 127 are valid ISO-8859-1 characters as well. So if you only have characters <= 127 in the stream,
mb_detect_encoding
will fall on the higher prevalence charset (which is by default "UTF-8")...