MYSQL special chars issue

2019-07-22 14:18发布

问题:

I´ve been having this problem for a long time, I´ve searched the internet many times for the solution, tried lots of them but not found an adequate solution.

I really don´t know what to do so if you could please help me I´d be very thankful.
(Sorry for my poor english).

Question: How can I solve the charset incompatibility between the input archive and a MYSql table?

Problem: When importing the archive from on my computer the information appears in my database, but some chars as ('ã', 'ç', 'á', etc..) are shown as ?.

Aditional information

  1. I'm using MYSql, my version and variable status are:

    MySQL VERSION : 5.5.10  
    HOST : localhost  
    USER : root  
    PORT : 3306  
    SERVER DEFAULT CHARSET : utf8  
    character_set_client  :  utf8  
    character_set_connection  :  utf8  
    character_set_database  :  utf8  
    character_set_filesystem  :  BINARY  
    character_set_results  :  utf8  
    character_set_server  :  utf8  
    character_set_system  :  utf8  
    collation_connection  :  utf8_general_ci  
    collation_database  :  utf8_general_ci  
    collation_server  :  utf8_general_ci  
    completion_type  :  NO_CHAIN  
    concurrent_insert  :  AUTO  
    
  2. The query that´s being used is:

    LOAD DATA LOCAL INFILE 'xxxxx/file.txt' 
    INTO TABLE xxxxTable 
    FIELDS TERMINATED BY ';' 
    LINES TERMINATED BY ' ' 
    IGNORE 1 LINES
    ( status_ordenar,numero,newstatus,rede,data_emissao,inicio,termino,tempo_indisp
    , cli_afet,qtd_cli_afet,cod_encerr,uf_ofensor,localidades,clientes_afetados 
    , especificacao,equipamentos,area_ofens,descricao_encerr,criticidade,cod_erro
    , observacao,id_falha_perc,id_falha_conf,nba,solucao,falhapercebida,falhaconfirmada
    , resp_i,resp_f,resp_ue,pre_handover,falha_identificada,report_netcool,tipo_falha
    , num_notificacao,equip_afetados,descricao) 
    
  3. About the file being imported: I´ve opened the file with open office whith 3 charsets:

    UTF8 - Gave me strange chars in place of the 'ç', 'ã', etc...
    ISO-8859-1 - OK.
    WIN-1252 - OK.
    ASCII/US - OK.

  4. Already tested: I´ve tested some charsets in my database: latin1, utf-8, ascii, but all of them gave me the same result (? instead of 'á', 'ç' etc).

  5. Extra: I'm using Java with Java JDBC to generate and send the query.

回答1:

file.txt is saved in ISO-8859-1 or Windows-1252 (these two are very similar), and being interpreted as UTF-8 by MySQL. These are incompatible.

How can I tell?

  • See point 3.: the file displays correctly when interpreted as ISO-8859-1 or Windows-1252.
  • See point 1.: character_set_database : utf8

Solution: either convert the file to UTF-8, or tell MySQL to interpret it as ISO-8859-1 or Windows-1252.

Background: the characters you provide (ã etc.) are single-byte values in windows-1252, and these bytes are illegal values in UTF-8, thus yielding the '?'s (unicode replacement characters).

Snippet from MySQL docs:

LOAD DATA INFILE Syntax

The character set indicated by the character_set_database system variable is used to interpret the information in the file.



回答2:

Saved your characters with standard Windows Notepad as UTF-8 file (Notepad++ is also OK).

Exact file content:

'ã', 'ç', 'á'

MySQL version: 5.5.22
Database charset: utf8
Database collation: utf8_general_ci

CREATE TABLE `abc` (
  `qwe` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Imported data with command

LOAD DATA LOCAL INFILE 'C:/test/utf8.txt' 
INTO TABLE abc
FIELDS TERMINATED BY ';' 
LINES TERMINATED BY ' ' 
IGNORE 1 LINES
( qwe) 

Result (displayed in SQLyog):


So, first - you should check original file with reliable editor (notepad, notepad++). If file corrupted, then you should take another file.

Second - if file is OK, add you Java code for sending data to MySql into question.