Why does MySQL use latin1_swedish_ci as the defaul

2019-02-01 15:57发布

Does anyone know why latin1_swedish is the default for MySQL. It would seem to me that UTF-8 would be more compatible right?

Defaults are usually chosen because they are the best universal choice, but in this case it does not seem thats what they did.

4条回答
我只想做你的唯一
2楼-- · 2019-02-01 16:34

Using a single-byte encoding has some advantages over multi-byte encondings, e.g. length of a string in bytes is equal to length of that string in characters. So if you use functions like SUBSTRING it is not intuitively clear if you mean characters or bytes. Also, for the same reasons, it requires quite a big change to the internal code to support multi-byte encodings.

查看更多
看我几分像从前
3楼-- · 2019-02-01 16:44

As far as I can see, latin1 was the default character set in pre-multibyte times and it looks like that's been continued, probably for reasons of downward compatibility (e.g. for older CREATE statements that didn't specify a collation).

From here:

What 4.0 Did

MySQL 4.0 (and earlier versions) only supported what amounted to a combined notion of the character set and collation with single-byte character encodings, which was specified at the server level. The default was latin1, which corresponds to a character set of latin1 and collation of latin1_swedish_ci in MySQL 4.1.

As to why swedish, I can only guess that it's because MySQL AB is/was swedish. I can't see any other reason for choosing this collation, it comes with some specific sorting quirks (ÄÖÜ come after Z I think) but they are nowhere near an international standard.

查看更多
等我变得足够好
4楼-- · 2019-02-01 16:47

latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA latin1 treats the code points between 0x80 and 0x9f as “undefined,” whereas cp1252, and therefore MySQL's latin1, assign characters for those positions.

from

http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html

Might help you understand why.

查看更多
beautiful°
5楼-- · 2019-02-01 16:48

Most strange features of this kind are historic. They did it like that long time ago, and now they can't change it without breaking some app depending on that behavior.

Perhaps UTF8 wasn't popular then. Or perhaps MySQL didn't support charsets where multiple bytes encode on character then.

查看更多
登录 后发表回答