Python - Human sort of numbers with alpha numeric,

2019-01-18 17:02发布

问题:

This question already has an answer here:

  • Does Python have a built in function for string natural sort? 15 answers

I have data rows and wish to have them presented as follows:

1
1a
1a2
2
3
9
9.9
10
10a
11
100
100ab
ab
aB
AB

As I am using pyQt and code is contained within a TreeWidgetItem, the code I'm trying to solve is:

def __lt__(self, otherItem):
    column = self.treeWidget().sortColumn()

    #return self.text(column).toLower() < otherItem.text(column).toLower()

    orig = str(self.text(column).toLower()).rjust(20, "0")
    other = str(otherItem.text(column).toLower()).rjust(20, "0")
    return orig < other

回答1:

This may help you. Edit the regexp to match the digit patterns you're interested in. Mine will treat any digit fields containing . as floats. Uses swapcase() to invert your case so that 'A' sorts after 'a'.

Updated: Refined:

import re

def _human_key(key):
    parts = re.split('(\d*\.\d+|\d+)', key)
    return tuple((e.swapcase() if i % 2 == 0 else float(e))
            for i, e in enumerate(parts))

nums = ['9', 'aB', '1a2', '11', 'ab', '10', '2', '100ab', 'AB', '10a',
    '1', '1a', '100', '9.9', '3']
nums.sort(key=_human_key)

print '\n'.join(nums)

Output:

1
1a
1a2
2
3
9
9.9
10
10a
11
100
100ab
ab
aB
AB

Update: (response to comment) If you have a class Foo and want to implement __lt__ using the _human_key sorting scheme, just return the result of _human_key(k1) < _human_key(k2);

class Foo(object):

    def __init__(self, key):
        self.key = key

    def __lt__(self, obj):
        return _human_key(self.key) < _human_key(obj.key)

>>> Foo('ab') < Foo('AB')
True
>>> Foo('AB') < Foo('AB')
False

So for your case, you'd do something like this:

def __lt__(self, other):
    column = self.treeWidget().sortColumn()
    k1 = self.text(column)
    k2 = other.text(column)
    return _human_key(k1) < _human_key(k2)

The other comparison operators (__eq__, __gt__, etc) would be implemented in the same way.



回答2:

Using samplebias's swapcase idea, and Ned Batchelder's human-sort code, you might do it this way:

import re
def human_keys(astr):
    '''
    alist.sort(key=human_keys) sorts in human order
    '''
    keys=[]
    for elt in re.split('(\d+)', astr):
        elt=elt.swapcase()
        try: elt=int(elt)
        except ValueError: pass
        keys.append(elt)
    return keys

x='''
    1
    1a
    1a2
    2
    3
    9
    9.9
    9.10
    9a2
    10
    10a
    11
    100
    100ab
    ab
    aB
    AB
    '''.split()

print(x)
assert x == sorted(x,key=human_keys)

You could apply human_keys in __lt__ like this:

def __lt__(self, otherItem):
    column = self.treeWidget().sortColumn()
    orig = str(self.text(column).toLower()).rjust(20, "0")
    other = str(otherItem.text(column).toLower()).rjust(20, "0")
    return human_keys(orig) < human_keys(other)


回答3:

I don't understand your sort algorithm, so I can't tell you how to implement it. But there is a general technique, which is to use the key parameter in Python's builtin sort function. In other words, you want to come up with some transformation of your data which Python would sort in the correct order, and then write that transformation as a Python function foo and call sort(data, key=foo).


Example: if you had a list of strings "<integer>-<integer>", say ["1-1","1-2","3-1"] and you wanted to sort by the second number and then the first, notice that Python would sort the data correctly if it were in the form [(1,1), (2,1), (1,3)] i.e. a list of reversed tuples. So you would write a function

def key(s):
    l, r = s.split("-")
    return int(r), int(l)

and then sort the list with sort(l, key=key).



回答4:

Here's a function that, given a string with a mixture of alphabetical and numeric parts, returns a tuple that will sort in a "natural" way.

def naturalkey(key, convert=int):
    if not key:
        return ()
    keys = []
    start = 0
    extra = ""
    in_num = key[0].isdigit()
    for i, char in enumerate(key):
        if start < i:
            if in_num:
                try:
                    last_num = convert(key[start:i])
                except:
                    in_num = False
                    if i > 2 and key[i-2] == ".":
                        extra = "."
                    keys.append(last_num)
                    start = i-1
            if not in_num:  # this is NOT equivalent to `else`!
                if char.isdigit():
                    keys.append(extra + key[start:i])
                    in_num = True
                    start = i
                    extra = ""
                    last_num = convert(char)
    keys.append(last_num if in_num else (extra + key[start:]))
    return tuple(keys)

The basic approach it uses is, when it sees a digit, it gathers additional characters and keeps trying to convert the result to a number until it can't anymore (i.e. it gets an exception). By default it tries to convert runs of characters to an integer, but you can pass in convert=float to have it accept decimal points. (It won't accept scientific notation, unfortunately, since to get something like '1e3' it would first try to parse '1e' which is invalid. This, along with the + or - sign, could be special-cased but it doesn't look like that is necessary for your use case.)

The function returns a tuple containing strings and numbers in the order they were found in the string, with the numbers parsed to the specified numeric type. For example:

naturalkey("foobar2000.exe")
>>> ("foobar", 2000, ".exe")

This tuple can be used as a key for sorting a list of strings:

my_list.sort(key=lambda i: naturalkey(i, float))

Or you can use it to implement a comparison function:

def __lt__(self, other):
    return naturalkey(self.value, float) < naturalkey(other.value, float)

It would be better (faster) to generate the natural key in the object's __init__() method, store it in the instance, and write your comparison function(s) to use the stored value instead. If the value from which the key is derived is mutable, you could write a property that updates the key when the underlying value is updated.