I have a table encoded in latin1
and collate latin1_bin
.
In my table there is a column comments
of type 'TEXT', as you know this column inherits table's encoding and collation, but from now on I should change it to be utf8
and utf8_general_ci
because I'm starting to store special characters in comments
.
Would it cause any downside effect if I'd use a command like the following?
alter table notebooks modify comments text CHARACTER SET utf8 COLLATE utf8_general_ci;
Thank you for your answer.
Danger I think that ALTER
will destroy existing text.
Also, ... Your 'name' looks Chinese, so I would guess that you want to store Chinese characters? In that case, you should use utf8mb4
, not just utf8
. This is because some of the Chinese characters take 4 bytes (and are not in the Unicode BMP).
I believe you need 2 steps:
ALTER TABLE notebooks MODIFY comments BLOB;
ALTER TABLE notebooks MODIFY comments TEXT
CHARACTER SET utf8mb4 COLLATE utf8mb4_general_520_ci;
Otherwise the latin1 characters will be "converted" to ut8. But if you really have Chinese in the column, you do not have latin1. The 2-step alter, above, does (1) turn off any knowledge of character set, and (2) establish that the bytes are really utf8mb4-encoded.
To be safer, first do
RENAME TABLE notebooks TO old;
CREATE TABLE notebooks LIKE old;
INSERT INTO notebooks SELECT * FROM old;
Then do the two ALTERs and test the result. If there is trouble, you can RENAME
to get back the old copy.
Specifying any collating sequence that does not involve direct integral comparison of a NATIVE character set will slow down your query. Whether it will slow it down noticeably is another issue. Looking up the ranking of this, and the ranking of that, in a table and comparing the two results is much, MUCH faster than retrieving on-disk information from a database, wouldn't you imagine?