Which collation to use so that `ş` and `s` are tre

2020-05-09 19:30发布

问题:

The issue is that ş and s are interpreted by MySQL as identical values.

I'm new to MySQL, so I have no idea which collations would view them as unique.

The collations that I've tried using which don't work are:

  1. utf8_general_ci
  2. utf8_unicode_520_ci
  3. utf8mb4_unicode_ci
  4. utf8mb4_unicode_520_ci

Does anybody know which collation to use?

P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?

回答1:

utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html

(Plus, of course, utf8_bin.)

For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8 is the one to use. In MySQL, it is utf8mb4. The "collations" that are associated with that are named utf8mb4_.... Collations control ordering and equality, as indicated in the first part of your question about s and ş.

MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.