Effects of updating a table with rows from utf8_tu

2019-07-02 01:22发布

问题:

I was unable to join some tables because some of the tables/rows were utf8_general_ci and some were utf8_turkish_ci. Thus I had to dublicate the turkish one, convert it to general and finally use it. However I wonder, what will happen to my application if I convert the original table from turkish to general? I use MySQL with PHP.

This was the initial error: Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_turkish_ci,IMPLICIT) for operation '='

回答1:

Your columns' data are stored using a character set. In this case it seems to be utf8.

When you operate upon those columns (doing, for example, equality comparisons or ordering), MySQL employs a collation. Each column has a default collation, which it inherits from the table's default collation.

Indexes have the column's default collation baked in to them so they can function efficiently.

You can do an equality comparison that's qualified by collation. For example, in a JOIN you can specify

ON (turkish.village_name COLLATE utf8_general_ci) = euro.village_name

or perhaps

ON turkish.village_name = (euro.village_name COLLATE utf8_turkish_ci)

That should eliminate your illegal mix of collations without requiring you to alter your table. This may help you avoid the database change you're asking about. But beware, using the COLLATE qualifier can defeat the use of an index. If you have a large table and you are relying on indexes for performance, this may be unhelpful.

So, what will happen if you alter your tables to change the default collation?

  1. Your data will not change (unless you also alter the character set). That is good.
  2. Any indexes involving columns with collations will be regenerated.
  3. Your comparisons and orderings may change. I don't know Turkish, so I can't tell you what might break. But, for example, in Spanish the letters N and Ñ are not the same. N comes before Ñ in a Spanish collation, but in the general collation they are treated as the same. There may be some aspect of the Turkish alphabet that works the same, so your ORDER BY results will be incorrect.

But, you can fix that by specifying a COLLATE modifier in your ORDER BY clause.

ORDER BY (euro.village_name COLLATE utf8_turkish_ci)