Natural Sort in MySQL

2018-12-31 05:24发布

Is there an elegant way to have performant, natural sorting in a MySQL database?

For example if I have this data set:

  • Final Fantasy
  • Final Fantasy 4
  • Final Fantasy 10
  • Final Fantasy 12
  • Final Fantasy 12: Chains of Promathia
  • Final Fantasy Adventure
  • Final Fantasy Origins
  • Final Fantasy Tactics

Any other elegant solution than to split up the games' names into their components

  • Title: "Final Fantasy"
  • Number: "12"
  • Subtitle: "Chains of Promathia"

to make sure that they come out in the right order? (10 after 4, not before 2).

Doing so is a pain in the a** because every now and then there's another game that breaks that mechanism of parsing the game title (e.g. "Warhammer 40,000", "James Bond 007")

19条回答
怪性笑人.
2楼-- · 2018-12-31 06:11

MySQL doesn't allow this sort of "natural sorting", so it looks like the best way to get what you're after is to split your data set up as you've described above (separate id field, etc), or failing that, perform a sort based on a non-title element, indexed element in your db (date, inserted id in the db, etc).

Having the db do the sorting for you is almost always going to be quicker than reading large data sets into your programming language of choice and sorting it there, so if you've any control at all over the db schema here, then look at adding easily-sorted fields as described above, it'll save you a lot of hassle and maintenance in the long run.

Requests to add a "natural sort" come up from time to time on the MySQL bugs and discussion forums, and many solutions revolve around stripping out specific parts of your data and casting them for the ORDER BY part of the query, e.g.

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned) 

This sort of solution could just about be made to work on your Final Fantasy example above, but isn't particularly flexible and unlikely to extend cleanly to a dataset including, say, "Warhammer 40,000" and "James Bond 007" I'm afraid.

查看更多
素衣白纱
3楼-- · 2018-12-31 06:11

I know this topic is ancient but I think I've found a way to do this:

SELECT * FROM `table` ORDER BY 
CONCAT(
  GREATEST(
    LOCATE('1', name),
    LOCATE('2', name),
    LOCATE('3', name),
    LOCATE('4', name),
    LOCATE('5', name),
    LOCATE('6', name),
    LOCATE('7', name),
    LOCATE('8', name),
    LOCATE('9', name)
   ),
   name
) ASC

Scrap that, it sorted the following set incorrectly (It's useless lol):

Final Fantasy 1 Final Fantasy 2 Final Fantasy 5 Final Fantasy 7 Final Fantasy 7: Advent Children Final Fantasy 12 Final Fantasy 112 FF1 FF2

查看更多
几人难应
4楼-- · 2018-12-31 06:12

Here is a quick solution:

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric
查看更多
低头抚发
5楼-- · 2018-12-31 06:18

I've written this function for MSSQL 2000 a while ago:

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO
查看更多
骚的不知所云
6楼-- · 2018-12-31 06:18

So, while I know that you have found a satisfactory answer, I was struggling with this problem for awhile, and we'd previously determined that it could not be done reasonably well in SQL and we were going to have to use javascript on a JSON array.

Here's how I solved it just using SQL. Hopefully this is helpful for others:

I had data such as:

Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2

I actually didn't "cast" things though I suppose that may also have worked.

I first replaced the parts that were unchanging in the data, in this case "Scene ", and then did a LPAD to line things up. This seems to allow pretty well for the alpha strings to sort properly as well as the numbered ones.

My ORDER BY clause looks like:

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')

Obviously this doesn't help with the original problem which was not so uniform - but I imagine this would probably work for many other related problems, so putting it out there.

查看更多
栀子花@的思念
7楼-- · 2018-12-31 06:18

You can also create in a dynamic way the "sort column" :

SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum 
FROM table 
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name

That way, you can create groups to sort.

In my query, I wanted the '-' in front of everything, then the numbers, then the text. Which could result in something like :

-
0    
1
2
3
4
5
10
13
19
99
102
Chair
Dog
Table
Windows

That way you don't have to maintain the sort column in the correct order as you add data. You can also change your sort order depending on what you need.

查看更多
登录 后发表回答