I started working on a wordpress on my dev machine. mysql version is 5.6, and worpdress is 4.7 so its already using the utf8mb4_unicode_520_ci encoding if it detects its possible.
My problem is that on my hosting (mysql 5.5) utf8mb4_unicode_520_ci is not recognized as a valid encoding. So I'm trying to target utf8mb4_unicode_ci encoding as my hosting knows about this one, and if I understand correctly, this would - in opposition to going to utf8 - allow me to keep the 4 bytes.
I tried several different combinaison of encoding and collation set up for the db, but nothing successful (from here How to convert an entire MySQL database characterset and collation to UTF-8?).
I tried several combination of encoding and collation in the wp-config, but nothing.
Everything that is coming from the database (like post titles and post contents displays badly encoded char for all diatrics, anything else is displayed appropriately )
I think I need to convert the actual content of the database, changing charset and collation does not seems to be enough.
I found this but it does not address my problem directly, or if it does I missed it.
Any help would be appreciated
————————————————————————————————
UPDATE :
here is the precise procedure I went through:
Initial situation:
I installed a wordpress (4.6.1) locally (on my dev machine, mysql 5.6.28). I worked on the theme and plugin locally
(at this moment I have, locally, a database that is utf8_general_ci
and tables that are utf8mb4_unicode_520_ci
Problem:
I want to deploy my wordpress on my hosting (mysql: 5.5 - db collation seems to be utf8mb4_unicode_ci
).
I mysqldump the db locally, then try to import it on my hostings' phpmyadmin.
This gives error :
Unknown collation: 'utf8mb4_unicode_520_ci'
solution 1 change the tables charset to utf8mb4_unicode_ci:
On my hosting sql server, utf8mb4_unicode_520_ci is not available and I can't get a more recent version of mysql.
utf8mb4_unicode_ci seems like the closest and is available on my hosting sql server.
from various so question, I adapt a bash script to change charset and collation of my tables
for tbl in wp_sij2017_commentmeta wp_sij2017_comments wp_sij2017_cwa wp_sij2017_links wp_sij2017_options wp_sij2017_postmeta wp_sij2017_posts wp_sij2017_term_relationships wp_sij2017_term_taxonomy wp_sij2017_termmeta wp_sij2017_terms wp_sij2017_usermeta wp_sij2017_users wp_sij2017_woocommerce_api_keys wp_sij2017_woocommerce_attribute_taxonomies wp_sij2017_woocommerce_downloadable_product_permissions wp_sij2017_woocommerce_order_itemmeta wp_sij2017_woocommerce_order_items wp_sij2017_woocommerce_payment_tokenmeta wp_sij2017_woocommerce_payment_tokens wp_sij2017_woocommerce_sessions wp_sij2017_woocommerce_shipping_zone_locations wp_sij2017_woocommerce_shipping_zone_methods wp_sij2017_woocommerce_shipping_zones wp_sij2017_woocommerce_tax_rate_locations wp_sij2017_woocommerce_tax_rates; do
mysql --execute="ALTER TABLE wp_sij_2017_original_copy.${tbl} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
done
I run this script on the local db
I now have all my tables set to collation utf8mb4_unicode_ci
My db collation is still utf8
I mysqldump the db, then import it to my hosting and...
Import is successful.
I search and replace siteurl in the db. I then visit the online website, I got SOME diatrics that renders a "question mark char"
Any text coming from the db has decoding issue AT SOME POINT
The source/html markup also has those "question mark char"
I have no idea where to look or what to do next
I'm find 'utf8mb4_unicode_520_ci' and replace with 'utf8mb4_unicode_ci' in .sql file. Its simplest why to solve this.
Clarification:
CHARACTER SETs
utf8
andutf8mb4
specify how characters are encoded into bytes.COLLATIONs
*_unicode_*
, etc, specify how those character compare.The encoding for
utf8mb4_unicode_ci
andutf8mb4_unicode_520_ci
are the same because they are encoded in the character set utf8mb4."database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci" -- that probably means that new tables in that database, unless specifically stated, will be
CHARACTER SET utf8 COLLATION utf8_general_ci
. That is the database setting is just a default forCREATE TABLE
. Since your tables are alreadyCHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci
, the database default is not relevant to them.As long as the
CHARACTER SET
staysutf8mb4
, no Emoji, Chinese, etc will be lost or otherwise mangled.Do not use
mysql40
; it did not know about anyCHARACTER SETs
. Do not useCONVERT
orCAST
. Etc.I assume the
520
is coming from the output ofmysqldump
? Do you have an editor that can handle a file that big? If so, simply edit it to changeutf8mb4_unicode_520_ci
toutf8mb4_unicode_ci
throughout. Then load the dump. Problem solved?Your fix
You did
ALTER ... CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
on your local machine. That is probably an even better way -- since it will put your dev and prod machine in line with each other. That should have worked. Don't worry about what the "database" claims.