PostgreSQL incorrect sorting

2020-02-05 06:56发布

I use PostgreSQL 9.3.3 and I have a table with one column named as title (character varying(50)).

When I have executed the following query:

select * from test
order by title asc

I got the following results:

#
A
#Example

Why "#Example" is in the last position? In my opinion "#Example" should be in the second position.

2条回答
家丑人穷心不美
2楼-- · 2020-02-05 07:38

Sort behaviour for text (including char and varchar as well as the text type) depends on the current collation of your locale.

See previous closely related questions:

If you want to do a simplistic sort by ASCII value, rather than a properly localized sort following your local language rules, you can use the COLLATE clause

select * 
from test
order by title COLLATE "C" ASC

or change the database collation globally (requires dump and reload, or full reindex). On my Fedora 19 Linux system, I get the following results:

regress=> SHOW lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#')) 
          SELECT title FROM v ORDER BY title ASC;
 title 
-------
 #
 a
 #a
 a#
 a#a
(5 rows)

regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#')) 
          SELECT title FROM v ORDER BY title COLLATE "C" ASC;
 title 
-------
 #
 #a
 a
 a#
 a#a
(5 rows)

PostgreSQL uses your operating system's collation support, so it's possible for results to vary slightly from host OS to host OS. In particular, at least some versions of Mac OS X have significantly broken unicode collation handling.

查看更多
神经病院院长
3楼-- · 2020-02-05 07:45

It seems, that when sorting Oracle as well as Postgres just ignore non alpha numeric chars, e.g.

  select '*' 
   union all
  select '#' 
   union all
  select 'A'
   union all
  select '*E'
   union all
  select '*B'
   union all
  select '#C'
   union all
  select '#D'
order by 1 asc

returns (look: that DBMS doesn't pay any attention on prefix before 'A'..'E')

  *
  #
  A
  *B
  #C
  #D
  *E

In your case, what Postgres actually sorts is

'', 'A' and 'Example'

If you put '#' in the middle od the string, the behaviour will be the same:

  select 'A#B'
   union all
  select 'AC'
   union all
  select 'A#D'  
   union all
  select 'AE' 
order by 1 asc

returns (# ignored, and so 'AB', 'AC', 'AD' and 'AE' actually compared)

  A#B
  AC
  A#D
  AE

To change the comparison rules you should use collation, e.g.

  select '#' collate "POSIX"
   union all
  select 'A' collate "POSIX"
   union all
  select '#Example' collate "POSIX"
order by 1 asc

returns (as it required in your case)

  #
  #Example
  A
查看更多
登录 后发表回答