Special characters get lost in MySQL export/import

2019-05-06 14:33发布

问题:

I'm trying to move a MySQL 3.23.58 database to a different server running 5.5.19.

The old one has latin1 encoding specified, and as far as I can tell the underlying data is indeed honestly latin1. I have tried many things, chiefly:

  • exporting from terminal with mysqldump and the latin1 encoding flag.
  • editing in vim to change "TYPE=InnoDB" to "ENGINE=InnoDB" for MySQL 5 compatibility.
  • importing to the new server from terminal.

Browsing the old server (in Sequel Pro for Mac, or MySQL Query Browser on PC), special characters don't always show properly, but they're there (looking at the binary in hex). (And in any case it works with the PHP web app.)

Browsing the new server, all special characters appear to have been replaced by question marks. I know that sometimes special characters can display as a question mark (or �) if the wrong encoding is specified. But these appear to be genuine straight-up encoded ASCII question marks on a binary level. The special characters (chiefly curly quotation marks and dashes) appear to have been lost, or destroyed, in the export/import.

Any idea why?

I know there are many things that can go wrong with encoding, with many different things at fault. I have been reading about this for several days (here and elsewhere) and tried setting all the right character encodings, tried UTF-8, tried casting and converting, tried Sequel Pro's export/import (as opposed to the terminal), etc. But I am stumped.

回答1:

Good, it looks like we've narrowed down your problem. I found this post

If your text editor is vim, then most likely the "<92>" is the hexadecimal code of an extended ASCII character. In this case, it is Hex(92) or Oct(222) or Dec(146) , which is "right single quotation mark"; not to confused with "single quote" which is ASCII Dec code 39.

One way to remove all non-ASCII characters from your file could be -

perl -plne 's/[^[:ascii:]]//g' <your_file>

Otherwise just search and replace "<92>" and "<97>" in your exported file with an appropriate character.

[Edit]

I'm not a VIM user but this post addresses the issue of replacing the <92> smart quote characters

For each value that you see in your file, just do a string substitution, like so:

:%s/<93>/\’/g

of course, you can’t just type that <93> in there, so to get it in there you use

CTRL-V x 93

which inserts hex 93 in place.

In recently exported CSV’s from excel, I’ve seen hex 91-97.