I'm porting a PHP Web application I wrote from MySQL 5 to SQLite 3. The text encoding for both is UTF-8 (for all fields, tables, and databases). I'm having trouble transferring a geo database with special characters.
mb_detect_encoding()
detects both as returning UTF-8 data.
For example,
Raw output:
MySQL (correct): Dārāb, Iran
SQLite (incorrect): DÄrÄb, Iran
JSON-encoded:
MySQL (correct): D\u0101r\u0101b, Iran
SQLite (incorrect): D\u00c4\u0081r\u00c4\u0081b, Iran
What fixes the problem:
$sqlite_output = utf8_encode($sqlite_output);
$sqlite_output = utf8_decode($sqlite_output);
I imagine there's a way of repairing the SQLite database. Thank you in advance.
The default PHP distribution builds libsqlite in ISO-8859-1 encoding mode. However, this is a misnomer; rather than handling ISO-8859-1, it operates according to your current locale settings for string comparisons and sort ordering. So, rather than ISO-8859-1, you should think of it as being '8-bit' instead.
You're probably going to have to transfer the data again from MySQL to SQLite. I don't think you can predictably revert back to proper encoding, as it seems SQLite interpreted utf8-input as non-utf8 or visa versa when the data first arrived, therefore not storing it in a proper format.
So try to transfer again, while making sure the whole chain of data between MySQL to SQLite is aware of the utf-8 encoding.
Well, thanks for the advice and comments. Unfortunately, no matter which configurations I chose, it wouldn't take. I ended up simply initiating two PDO objects and, using a
while
loop, inserting one row at a time. (I usedmysqldump
's--no-data
option to get the structure and modified that by hand.)It took about 10 minutes to insert ~10,000 rows equal to 9.4MB of data on my 256MB CentOS box. (So if you're on a shared environment, be wary of the maximum execution time.) The SQLite database now returns proper Unicode data.
Note to self: It's easier to code a work-around than finding the recommended solution.