MYSQL 5.1.61 sorting for Central European language

2019-08-14 05:51发布

问题:

I have a problem with sorting MYSQL result..

SELECT * FROM table WHERE something ORDER BY column ASC

column is set to utf8_unicode_ci..

As a result I first get rows which have column starting with Bosnian letters and then the others after that..

šablabl
šeblabla
čeblabla
aaaa
bbaa
bbb
ccc

MYSQL version is 5.1.61

回答1:

Bgi is right. You need to use an appropriate collation. Unfortunately, MySQL doesn't have a Central European unicode collation yet. MariaDb, the MySQL fork being maintained by MySQL's creators, does.

So you can convert your text from utf8 to latin2 and then order with a Central European collating sequence. For example.

  SELECT * 
    FROM tab
ORDER BY CONVERT(text USING latin2) COLLATE latin2_croatian_ci

See this fiddle: http://sqlfiddle.com/#!2/c8dd4/1/0



回答2:

It is because the way of unicode is made. All the "normal" latin characters got back the same numerical correspondance they had in ASCII, and other characters from other cultures were added after. That means if your alphabet has other characters than the 26 regular ASCII ones, it wont appear in the correct order in Unicode.

I think you should try to change the collation on your column (maybe you'll have to change the charset also, but maybe not).

Use a Central European collation.

Good luck !!



回答3:

If that's really what you see you have found a bug: utf8_unicode_ci is supposed to consider š equivalent to s and č equivalent to c!

In any case it's true that MySQL does not have great support of utf8 collations for Central European languages: you get only Czech, Slovak, and Slovenian. If none of those work for you, I guess you'll have to create your own utf8 collation, or use a non-Unicode character set and use the collations available there.



回答4:

Older question and plenty of answers.

Maybe the way I deal with problems will help someone.

I use PDO. My DB is utf-8.

First - my db singleton code (relevant part of it). I set 'SET NAMES' to 'utf8' for all connections.

     $attrib_array = array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8');
     if (DB_HANDLER) 
           $attrib_array[PDO::ATTR_ERRMODE] = PDO::ERRMODE_EXCEPTION;
       self::$instance = new PDO(DB_TYPE.':host='.DB_HOST.';dbname='.DB_NAME, DB_USER, DB_PASS, $attrib_array);

Second - my sorting looks something like this - collation depends on language (sample shows polish):

      ORDER BY some_column COLLATE utf8_polish_ci DESC

To make things more streamlined I use a constant, which I define in lang translation file, so when file is pulled, proper collation constant is set. Of course I have 'utf8_general_ci' as default. Example:

      define('MY_LOCALIZED_COLLATE', 'COLLATE utf8_polish_ci');

Now, my (relevant part of) query looks like this:

      " ... ORDER BY some_column " . MY_LOCALIZED_COLLATE . " DESC" ;

Above works in most cases.

If you are missing collation set, you may try to add one yourself.

More detailed info about creating such set - see here: http://dev.mysql.com/doc/refman/5.0/en/adding-collation.html

EDIT:

Just one more thing I noticed:

  • if you have list to sort in e.g. Polish
  • and you have to force proper collation for sorting (as described above)
  • and you use e.g. INT column as sorting vector

... then you better have collation set (e.g. to UTF8), or you will get SQL errors, e.g.:

"Syntax error or access violation: 1253 COLLATION 'utf8_polish_ci' is not valid for CHARACTER SET 'latin1'"

... strange, but true