utf8mb4 setting for talend - not working

2019-07-24 04:22发布

问题:

I am migrating the data from sql server to mysql. I am using the tool Talend(ETL) for the same.

The problem comes when I have emojis in the source (sql server) , it does not get inserted to the table in mysql. So, I know I must use utf8mb4 on mysql side.

The client settings character encoding has to be set, for the smileys to get inserted. The database, tables and the server are all on utf8mb4

But, the client i.e., talend is not utf8mb4. So where do I set this?

I tried with 'set names utf8mb4' in additional parameters of tmysqloutput. But this does not work

I have been stuck on this for days, any help on this would be greatly appreciated

Update :

The job looks like this now. But, the smileys are still getting exported as '?'

Thanks Rathi

回答1:

First, make sur that your server is properly configured to use utf8mb4. Following this tutorial, you need to add the following to your my.cnf (or my.ini if you're on Windows):

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

That tells MySQL server to use utf8mb4 and ignore any encoding set by client.
After that, I didn't need to do set any additional properties on the MySQL connection in Talend. I've executed this query in Talend to check the encoding set by it :

SHOW VARIABLES 
WHERE Variable_name LIKE 'character\\_set\\_%' OR Variable_name LIKE 'collation%'

And it returned:

|=-----------------------+-----------------=|
|Variable_Name           |Value             |
|=-----------------------+-----------------=|
|character_set_client    |utf8mb4           |
|character_set_connection|utf8mb4           |
|character_set_database  |utf8mb4           |
|character_set_filesystem|binary            |
|character_set_results   |                  |
|character_set_server    |utf8mb4           |
|character_set_system    |utf8              |
|collation_connection    |utf8mb4_unicode_ci|
|collation_database      |utf8mb4_unicode_ci|
|collation_server        |utf8mb4_unicode_ci|
'------------------------+------------------'

The following test to insert a pile of poop works:

Update

Using native MySQL components in Talend 6.3.1, you get mysql-connector-java-5.1.30-bin.jar, which is supposed to automatically detect the utf8mb4 used by the server, but for some reason (bug?) it isn't doing that.
I switched to using JDBC components, and downloaded the latest mysql connector (mysql-connector-java-5.1.45-bin.jar), I got it working by setting these additional parameters on the tJDBCConnection component :

useUnicode=true&characterEncoding=utf-8

(even if I'm specifying utf-8, the doc says it will treat it as utf8mb4)

Here's what my job looks like now :