UTF-8 Corrupted from MySQL to SQLite

2019-08-30 08:22发布

I'm porting a PHP Web application I wrote from MySQL 5 to SQLite 3. The text encoding for both is UTF-8 (for all fields, tables, and databases). I'm having trouble transferring a geo database with special characters.

mb_detect_encoding() detects both as returning UTF-8 data.

For example,

Raw output:

MySQL (correct): Dārāb, Iran
SQLite (incorrect): DÄrÄb, Iran

JSON-encoded:

MySQL (correct): D\u0101r\u0101b, Iran
SQLite (incorrect): D\u00c4\u0081r\u00c4\u0081b, Iran

What fixes the problem:

$sqlite_output = utf8_encode($sqlite_output);
$sqlite_output = utf8_decode($sqlite_output);

I imagine there's a way of repairing the SQLite database. Thank you in advance.

标签: php sqlite utf-8
3条回答
戒情不戒烟
2楼-- · 2019-08-30 08:57

The default PHP distribution builds libsqlite in ISO-8859-1 encoding mode. However, this is a misnomer; rather than handling ISO-8859-1, it operates according to your current locale settings for string comparisons and sort ordering. So, rather than ISO-8859-1, you should think of it as being '8-bit' instead.

查看更多
走好不送
3楼-- · 2019-08-30 09:04

You're probably going to have to transfer the data again from MySQL to SQLite. I don't think you can predictably revert back to proper encoding, as it seems SQLite interpreted utf8-input as non-utf8 or visa versa when the data first arrived, therefore not storing it in a proper format.

So try to transfer again, while making sure the whole chain of data between MySQL to SQLite is aware of the utf-8 encoding.

查看更多
Ridiculous、
4楼-- · 2019-08-30 09:07

Well, thanks for the advice and comments. Unfortunately, no matter which configurations I chose, it wouldn't take. I ended up simply initiating two PDO objects and, using a while loop, inserting one row at a time. (I used mysqldump's --no-data option to get the structure and modified that by hand.)

It took about 10 minutes to insert ~10,000 rows equal to 9.4MB of data on my 256MB CentOS box. (So if you're on a shared environment, be wary of the maximum execution time.) The SQLite database now returns proper Unicode data.

Note to self: It's easier to code a work-around than finding the recommended solution.

查看更多
登录 后发表回答