I am trying to store a tweet in my MYSQL table. The tweet is:
quiero que me escuches, no te burles no te rias, anoche tuve un sueño que te fuiste de mi vida 🎶🎶
The final two characters are both 'MULTIPLE MUSICAL NOTES' (U+1F3B6), for which the UTF-8 encoding is 0xf09f8eb6
.
The tweet_text
field in my table is encoded in utf8mb4
. But when I try to store the tweet in that column I get the following error message:
Incorrect string value: '\xF0\x9F\x8E\xB6\xF0\x9F...' for column 'tweet_text' at row 1.
What is going wrong? How can I fix this? I need to store multiple languages as well and this character set works for all languages but not for the special characters like emoticons and emojis.
This is my create table statement:
CREATE TABLE `twitter_status_data` (
`unique_status_id` bigint(20) NOT NULL AUTO_INCREMENT,
`metadata_result_type` text CHARACTER SET utf8,
`created_at` text CHARACTER SET utf8 NOT NULL COMMENT 'UTC time when this Tweet was created.',
`id` bigint(20) unsigned NOT NULL COMMENT 'Unique tweet identifier',
`id_str` text CHARACTER SET utf8 NOT NULL,
`tweet_text` text COMMENT 'Actual UTF-8 text',
`user_id_str` text CHARACTER SET utf8,
`user_name` text COMMENT 'User''s name',
`user_screen_name` text COMMENT 'Twitter handle',
`coordinates` text CHARACTER SET utf8,
PRIMARY KEY (`unique_status_id`),
KEY `user_id_index` (`user_id`),
FULLTEXT KEY `tweet_text_index` (`tweet_text`)
) ENGINE=InnoDB AUTO_INCREMENT=82451 DEFAULT CHARSET=utf8mb4;
I was finally able to figure out the issue.
I had to change some settings in mysql configuration my.ini
This article helped a lot
http://mathiasbynens.be/notes/mysql-utf8mb4#character-sets
First i changed the character set in my.ini to utf8mb4
Next i ran the following commands in mysql client
SET NAMES utf8mb4;
ALTER DATABASE dreams_twitter CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci;
Use the following command to check that the changes are made
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
I had hit the same problem and learnt the following-
Even though database has a default character set of utf-8, it's possible for database columns to have a different character set in MySQL.
Modified dB and the problematic column to UTF-8:
mysql> ALTER DATABASE MyDB CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci'
mysql> ALTER TABLE database.table MODIFY COLUMN column_name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
Now creating new tables with:
> CREATE TABLE My_Table_Name (
twitter_id_str VARCHAR(255) NOT NULL UNIQUE,
twitter_screen_name VARCHAR(512) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
.....
) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
It may be obvious, but it still was surprising to me, that SET NAMES utf8
is not compatible with utf8mb4
encoding. So for some apps changing table/column encoding was not enough. I had to change encoding in app configuration.
Redmine (ruby, ROR)
In config/database.yml
:
production:
adapter: mysql2
database: redmine
host: localhost
username: redmine
password: passowrd
encoding: utf8mb4
Custom Yii application (PHP)
In config/db.php
:
return [
'class' => yii\db\Connection::class,
'dsn' => 'mysql:host=localhost;dbname=yii',
'username' => 'yii',
'password' => 'password',
'charset' => 'utf8mb4',
],
If you have utf8mb4
as a column/table encoding and still getting errors like this, make sure that you have configured correct charset for DB connection in your application.
According to the create table statement, the default charset of the table is already utf8mb4. It seems that you have a wrong connection charset.
In Java, set the datasource url like this: jdbc:mysql://127.0.0.1:3306/testdb?useUnicode=true&characterEncoding=utf-8.
"?useUnicode=true&characterEncoding=utf-8" is necessary for using utf8mb4.
It works for my application.