Alphanumeric case in-sensitive sorting in postgres

2019-01-25 05:01发布

I am new to postrges and want to sort varchar type columns. want to explain the problem with with below example:

table name: testsorting

   order       name
    1            b
    2            B
    3            a
    4            a1
    5            a11
    6            a2
    7            a20
    8            A
    9            a19

case sensitive sorting (which is default in postgres) gives:

select name from testsorting order by name;

    A
    B
    a
    a1
    a11
    a19
    a2
    a20
    b

case in-sensitive sorting gives:

select name from testsorting order by UPPER(name);

      A
      a
      a1
      a11
      a19
      a2
      a20
      B
      b

how can i make alphanumeric case in-sensitive sorting in postgres to get below order:

          a
          A
          a1
          a2
          a11
          a19
          a20
          b
          B

I wont mind the order for capital or small letters, but the order should be "aAbB" or "AaBb" and should not be "ABab"

Please suggest if you have any solution to this in postgres.

5条回答
劫难
2楼-- · 2019-01-25 05:39

I agree with Clodoaldo Neto's answer, but also don't forget to add the index

CREATE INDEX testsorting_name on testsorting(upper(left(name,1)), substring(name from 2)::integer)
查看更多
干净又极端
3楼-- · 2019-01-25 05:45

Answer strongly inspired from this one.
By using a function it will be easier to keep it clean if you need it over different queries.

CREATE OR REPLACE FUNCTION alphanum(str anyelement)
   RETURNS anyelement AS $$
BEGIN
   RETURN (SUBSTRING(str, '^[^0-9]*'),
      COALESCE(SUBSTRING(str, '[0-9]+')::INT, -1) + 2000000);
END;
$$ LANGUAGE plpgsql IMMUTABLE;

Then you could use it this way:

SELECT name FROM testsorting ORDER BY alphanum(name);

Test:

WITH x(name) AS (VALUES ('b'), ('B'), ('a'), ('a1'),
   ('a11'), ('a2'), ('a20'), ('A'), ('a19'))
SELECT name, alphanum(name) FROM x ORDER BY alphanum(name);

 name |  alphanum   
------+-------------
 a    | (a,1999999)
 A    | (A,1999999)
 a1   | (a,2000001)
 a2   | (a,2000002)
 a11  | (a,2000011)
 a19  | (a,2000019)
 a20  | (a,2000020)
 b    | (b,1999999)
 B    | (B,1999999)
查看更多
我欲成王,谁敢阻挡
4楼-- · 2019-01-25 05:52

If the name is always in the 1 alpha followed by n numerics format then:

select name
from testsorting
order by
    upper(left(name, 1)),
    (substring(name from 2) || '0')::integer
查看更多
Rolldiameter
5楼-- · 2019-01-25 05:53

My PostgreSQL sorts the way you want. The way PostgreSQL compares strings is determined by locale and collation. When you create database using createdb there is -l option to set locale. Also you can check how it is configured in your environment using psql -l:

[postgres@test]$ psql -l
List of databases
 Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
---------+----------+----------+------------+------------+-----------------------
 mn_test | postgres | UTF8     | pl_PL.UTF8 | pl_PL.UTF8 |

As you see my database uses Polish collation.

If you created database using other collation then you can use other collation in query just like:

SELECT * FROM sort_test ORDER BY name COLLATE "C";
SELECT * FROM sort_test ORDER BY name COLLATE "default";
SELECT * FROM sort_test ORDER BY name COLLATE "pl_PL";

You can list available collations by:

SELECT * FROM pg_collation;

EDITED:

Oh, I missed that 'a11' must be before 'a2'.

I don't think standard collation can solve alphanumeric sorting. For such sorting you will have to split string into parts just like in Clodoaldo Neto response. Another option that is useful if you frequently have to order this way is to separate name field into two columns. You can create trigger on INSERT and UPDATE that split name into name_1 and name_2 and then:

SELECT name FROM sort_test ORDER BY name_1 COLLATE "en_EN", name_2;

(I changed collation from Polish into English, you should use your native collation to sort letters like aącć etc)

查看更多
ら.Afraid
6楼-- · 2019-01-25 05:54

PostgreSQL uses the C library locale facilities for sorting strings. C library is provided by the host operating system. On Mac OS X or a BSD-family operating system,the UTF-8 locale definitions are broken and hence the results are as per collation "C".

image attached for collation results with ubuntu 15.04 as host OS

Check FAQ's on postgres wiki for more details : https://wiki.postgresql.org/wiki/FAQ

查看更多
登录 后发表回答