Using Python 3.x, I have a list of strings for which I would like to perform a natural alphabetical sort.
Natural sort: The order by which files in Windows are sorted.
For instance, the following list is naturally sorted (what I want):
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
And here's the "sorted" version of the above list (what I have):
['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
I'm looking for a sort function which behaves like the first one.
There are many implementations out there, and while some have come close, none quite captured the elegance modern python affords.
Caution when using
from os.path import split
Inspiration from
Value Of This Post
My point is to offer a non regex solution that can be applied generally.
I'll create three functions:
find_first_digit
which I borrowed from @AnuragUniyal. It will find the position of the first digit or non-digit in a string.split_digits
which is a generator that picks apart a string into digit and non digit chunks. It will alsoyield
integers when it is a digit.natural_key
just wrapssplit_digits
into atuple
. This is what we use as a key forsorted
,max
,min
.Functions
We can see that it is general in that we can have multiple digit chunks:
Or leave as case sensitive:
We can see that it sorts the OP's list in the appropriate order
But it can handle more complicated lists as well:
My regex equivalent would be
I wrote a function based on http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html which adds the ability to still pass in your own 'key' parameter. I need this in order to perform a natural sort of lists that contain more complex objects (not just strings).
For example:
Given:
Similar to SergO's solution, a 1-liner without external libraries would be:
or
Explanation:
This solution uses the key feature of sort to define a function that will be employed for the sorting. Because we know that every data entry is preceded by 'elm' the sorting function converts to integer the portion of the string after the 3rd character (i.e. int(x[3:])). If the numerical part of the data is in a different location, then this part of the function would have to change.
Cheers
There is a third party library for this on PyPI called natsort (full disclosure, I am the package's author). For your case, you can do either of the following:
You should note that
natsort
uses a general algorithm so it should work for just about any input that you throw at it. If you want more details on why you might choose a library to do this rather than rolling your own function, check out thenatsort
documentation's How It Works page, in particular the Special Cases Everywhere! section.If you need a sorting key instead of a sorting function, use either of the below formulas.
Based on the answers here, I wrote a
natural_sorted
function that behaves like the built-in functionsorted
:The source code is also available in my GitHub snippets repository: https://github.com/bdrung/snippets/blob/master/natural_sorted.py