MYSQL 5.1.61 sorting for Central European language

2019-08-14 05:51发布

I have a problem with sorting MYSQL result..

SELECT * FROM table WHERE something ORDER BY column ASC

column is set to utf8_unicode_ci..

As a result I first get rows which have column starting with Bosnian letters and then the others after that..

šablabl
šeblabla
čeblabla
aaaa
bbaa
bbb
ccc

MYSQL version is 5.1.61

4条回答
爷的心禁止访问
2楼-- · 2019-08-14 06:15

Older question and plenty of answers.

Maybe the way I deal with problems will help someone.

I use PDO. My DB is utf-8.

First - my db singleton code (relevant part of it). I set 'SET NAMES' to 'utf8' for all connections.

     $attrib_array = array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8');
     if (DB_HANDLER) 
           $attrib_array[PDO::ATTR_ERRMODE] = PDO::ERRMODE_EXCEPTION;
       self::$instance = new PDO(DB_TYPE.':host='.DB_HOST.';dbname='.DB_NAME, DB_USER, DB_PASS, $attrib_array);

Second - my sorting looks something like this - collation depends on language (sample shows polish):

      ORDER BY some_column COLLATE utf8_polish_ci DESC

To make things more streamlined I use a constant, which I define in lang translation file, so when file is pulled, proper collation constant is set. Of course I have 'utf8_general_ci' as default. Example:

      define('MY_LOCALIZED_COLLATE', 'COLLATE utf8_polish_ci');

Now, my (relevant part of) query looks like this:

      " ... ORDER BY some_column " . MY_LOCALIZED_COLLATE . " DESC" ;

Above works in most cases.

If you are missing collation set, you may try to add one yourself.

More detailed info about creating such set - see here: http://dev.mysql.com/doc/refman/5.0/en/adding-collation.html

EDIT:

Just one more thing I noticed:

  • if you have list to sort in e.g. Polish
  • and you have to force proper collation for sorting (as described above)
  • and you use e.g. INT column as sorting vector

... then you better have collation set (e.g. to UTF8), or you will get SQL errors, e.g.:

"Syntax error or access violation: 1253 COLLATION 'utf8_polish_ci' is not valid for CHARACTER SET 'latin1'"

... strange, but true

查看更多
家丑人穷心不美
3楼-- · 2019-08-14 06:19

Bgi is right. You need to use an appropriate collation. Unfortunately, MySQL doesn't have a Central European unicode collation yet. MariaDb, the MySQL fork being maintained by MySQL's creators, does.

So you can convert your text from utf8 to latin2 and then order with a Central European collating sequence. For example.

  SELECT * 
    FROM tab
ORDER BY CONVERT(text USING latin2) COLLATE latin2_croatian_ci

See this fiddle: http://sqlfiddle.com/#!2/c8dd4/1/0

查看更多
爱情/是我丢掉的垃圾
4楼-- · 2019-08-14 06:25

If that's really what you see you have found a bug: utf8_unicode_ci is supposed to consider š equivalent to s and č equivalent to c!

In any case it's true that MySQL does not have great support of utf8 collations for Central European languages: you get only Czech, Slovak, and Slovenian. If none of those work for you, I guess you'll have to create your own utf8 collation, or use a non-Unicode character set and use the collations available there.

查看更多
神经病院院长
5楼-- · 2019-08-14 06:34

It is because the way of unicode is made. All the "normal" latin characters got back the same numerical correspondance they had in ASCII, and other characters from other cultures were added after. That means if your alphabet has other characters than the 26 regular ASCII ones, it wont appear in the correct order in Unicode.

I think you should try to change the collation on your column (maybe you'll have to change the charset also, but maybe not).

Use a Central European collation.

Good luck !!

查看更多
登录 后发表回答