Issue when deploying mysql db (utf8mb4_unicode_520

2019-09-11 15:32发布

问题:

I started working on a wordpress on my dev machine. mysql version is 5.6, and worpdress is 4.7 so its already using the utf8mb4_unicode_520_ci encoding if it detects its possible.

My problem is that on my hosting (mysql 5.5) utf8mb4_unicode_520_ci is not recognized as a valid encoding. So I'm trying to target utf8mb4_unicode_ci encoding as my hosting knows about this one, and if I understand correctly, this would - in opposition to going to utf8 - allow me to keep the 4 bytes.

I tried several different combinaison of encoding and collation set up for the db, but nothing successful (from here How to convert an entire MySQL database characterset and collation to UTF-8?).

I tried several combination of encoding and collation in the wp-config, but nothing.

Everything that is coming from the database (like post titles and post contents displays badly encoded char for all diatrics, anything else is displayed appropriately )

menu label from the database display incorrectly, where the hardcoded/translated label display correctly

I think I need to convert the actual content of the database, changing charset and collation does not seems to be enough.

I found this but it does not address my problem directly, or if it does I missed it.

Any help would be appreciated

————————————————————————————————

UPDATE :

here is the precise procedure I went through:

Initial situation:

I installed a wordpress (4.6.1) locally (on my dev machine, mysql 5.6.28). I worked on the theme and plugin locally

(at this moment I have, locally, a database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci

Problem:

I want to deploy my wordpress on my hosting (mysql: 5.5 - db collation seems to be utf8mb4_unicode_ci). I mysqldump the db locally, then try to import it on my hostings' phpmyadmin. This gives error :

Unknown collation: 'utf8mb4_unicode_520_ci'

solution 1 change the tables charset to utf8mb4_unicode_ci:

On my hosting sql server, utf8mb4_unicode_520_ci is not available and I can't get a more recent version of mysql.

utf8mb4_unicode_ci seems like the closest and is available on my hosting sql server.

from various so question, I adapt a bash script to change charset and collation of my tables

for tbl in wp_sij2017_commentmeta wp_sij2017_comments wp_sij2017_cwa wp_sij2017_links wp_sij2017_options wp_sij2017_postmeta wp_sij2017_posts wp_sij2017_term_relationships wp_sij2017_term_taxonomy wp_sij2017_termmeta wp_sij2017_terms wp_sij2017_usermeta wp_sij2017_users wp_sij2017_woocommerce_api_keys wp_sij2017_woocommerce_attribute_taxonomies wp_sij2017_woocommerce_downloadable_product_permissions wp_sij2017_woocommerce_order_itemmeta wp_sij2017_woocommerce_order_items wp_sij2017_woocommerce_payment_tokenmeta wp_sij2017_woocommerce_payment_tokens wp_sij2017_woocommerce_sessions wp_sij2017_woocommerce_shipping_zone_locations wp_sij2017_woocommerce_shipping_zone_methods wp_sij2017_woocommerce_shipping_zones wp_sij2017_woocommerce_tax_rate_locations wp_sij2017_woocommerce_tax_rates; do
 mysql --execute="ALTER TABLE wp_sij_2017_original_copy.${tbl} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
done

I run this script on the local db I now have all my tables set to collation utf8mb4_unicode_ci

My db collation is still utf8

I mysqldump the db, then import it to my hosting and...

Import is successful.

I search and replace siteurl in the db. I then visit the online website, I got SOME diatrics that renders a "question mark char"

Any text coming from the db has decoding issue AT SOME POINT

The source/html markup also has those "question mark char"

I have no idea where to look or what to do next

回答1:

Clarification: CHARACTER SETs utf8 and utf8mb4 specify how characters are encoded into bytes. COLLATIONs *_unicode_*, etc, specify how those character compare.

The encoding for utf8mb4_unicode_ci and utf8mb4_unicode_520_ci are the same because they are encoded in the character set utf8mb4.

"database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci" -- that probably means that new tables in that database, unless specifically stated, will be CHARACTER SET utf8 COLLATION utf8_general_ci. That is the database setting is just a default for CREATE TABLE. Since your tables are already CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci, the database default is not relevant to them.

As long as the CHARACTER SET stays utf8mb4, no Emoji, Chinese, etc will be lost or otherwise mangled.

Do not use mysql40; it did not know about any CHARACTER SETs. Do not use CONVERT or CAST. Etc.

I assume the 520 is coming from the output of mysqldump? Do you have an editor that can handle a file that big? If so, simply edit it to change utf8mb4_unicode_520_ci to utf8mb4_unicode_ci throughout. Then load the dump. Problem solved?

Your fix

You did ALTER ... CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci on your local machine. That is probably an even better way -- since it will put your dev and prod machine in line with each other. That should have worked. Don't worry about what the "database" claims.



回答2:

I'm find 'utf8mb4_unicode_520_ci' and replace with 'utf8mb4_unicode_ci' in .sql file. Its simplest why to solve this.