Natural Sort Order in C#

2018-12-31 00:29发布

Anyone have a good resource or provide a sample of a natural order sort in C# for an FileInfo array? I am implementing the IComparer interface in my sorts.

17条回答
余欢
2楼-- · 2018-12-31 01:31

Let me explain my problem and how i was able to solve it.

Problem:- Sort files based on FileName from FileInfo objects which are retrieved from a Directory.

Solution:- I selected the file names from FileInfo and trimed the ".png" part of the file name. Now, just do List.Sort(), which sorts the filenames in Natural sorting order. Based on my testing i found that having .png messes up sorting order. Have a look at the below code

var imageNameList = new DirectoryInfo(@"C:\Temp\Images").GetFiles("*.png").Select(x =>x.Name.Substring(0, x.Name.Length - 4)).ToList();
imageNameList.Sort();
查看更多
弹指情弦暗扣
3楼-- · 2018-12-31 01:32

Just thought I'd add to this (with the most concise solution I could find):

public static IOrderedEnumerable<T> OrderByAlphaNumeric<T>(this IEnumerable<T> source, Func<T, string> selector)
{
    int max = source
        .SelectMany(i => Regex.Matches(selector(i), @"\d+").Cast<Match>().Select(m => (int?)m.Value.Length))
        .Max() ?? 0;

    return source.OrderBy(i => Regex.Replace(selector(i), @"\d+", m => m.Value.PadLeft(max, '0')));
}

The above pads any numbers in the string to the max length of all numbers in all strings and uses the resulting string to sort.

The cast to (int?) is to allow for collections of strings without any numbers (.Max() on an empty enumerable throws an InvalidOperationException).

查看更多
美炸的是我
4楼-- · 2018-12-31 01:33

The easiest thing to do is just P/Invoke the built-in function in Windows, and use it as the comparison function in your IComparer:

[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
private static extern int StrCmpLogicalW(string psz1, string psz2);

Michael Kaplan has some examples of how this function works here, and the changes that were made for Vista to make it work more intuitively. The plus side of this function is that it will have the same behaviour as the version of Windows it runs on, however this does mean that it differs between versions of Windows so you need to consider whether this is a problem for you.

So a complete implementation would be something like:

[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
    [DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
    public static extern int StrCmpLogicalW(string psz1, string psz2);
}

public sealed class NaturalStringComparer : IComparer<string>
{
    public int Compare(string a, string b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a, b);
    }
}

public sealed class NaturalFileInfoNameComparer : IComparer<FileInfo>
{
    public int Compare(FileInfo a, FileInfo b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a.Name, b.Name);
    }
}
查看更多
人间绝色
5楼-- · 2018-12-31 01:33

None of the existing implementations looked great so I wrote my own. The results are almost identical to the sorting used by modern versions of Windows Explorer (Windows 7/8). The only differences I've seen are 1) although Windows used to (e.g. XP) handle numbers of any length, it's now limited to 19 digits - mine is unlimited, 2) Windows gives inconsistent results with certain sets of Unicode digits - mine works fine (although it doesn't numerically compare digits from surrogate pairs; nor does Windows), and 3) mine can't distinguish different types of non-primary sort weights if they occur in different sections (e.g. "e-1é" vs "é1e-" - the sections before and after the number have diacritic and punctuation weight differences).

public static int CompareNatural(string strA, string strB) {
    return CompareNatural(strA, strB, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase);
}

public static int CompareNatural(string strA, string strB, CultureInfo culture, CompareOptions options) {
    CompareInfo cmp = culture.CompareInfo;
    int iA = 0;
    int iB = 0;
    int softResult = 0;
    int softResultWeight = 0;
    while (iA < strA.Length && iB < strB.Length) {
        bool isDigitA = Char.IsDigit(strA[iA]);
        bool isDigitB = Char.IsDigit(strB[iB]);
        if (isDigitA != isDigitB) {
            return cmp.Compare(strA, iA, strB, iB, options);
        }
        else if (!isDigitA && !isDigitB) {
            int jA = iA + 1;
            int jB = iB + 1;
            while (jA < strA.Length && !Char.IsDigit(strA[jA])) jA++;
            while (jB < strB.Length && !Char.IsDigit(strB[jB])) jB++;
            int cmpResult = cmp.Compare(strA, iA, jA - iA, strB, iB, jB - iB, options);
            if (cmpResult != 0) {
                // Certain strings may be considered different due to "soft" differences that are
                // ignored if more significant differences follow, e.g. a hyphen only affects the
                // comparison if no other differences follow
                string sectionA = strA.Substring(iA, jA - iA);
                string sectionB = strB.Substring(iB, jB - iB);
                if (cmp.Compare(sectionA + "1", sectionB + "2", options) ==
                    cmp.Compare(sectionA + "2", sectionB + "1", options))
                {
                    return cmp.Compare(strA, iA, strB, iB, options);
                }
                else if (softResultWeight < 1) {
                    softResult = cmpResult;
                    softResultWeight = 1;
                }
            }
            iA = jA;
            iB = jB;
        }
        else {
            char zeroA = (char)(strA[iA] - (int)Char.GetNumericValue(strA[iA]));
            char zeroB = (char)(strB[iB] - (int)Char.GetNumericValue(strB[iB]));
            int jA = iA;
            int jB = iB;
            while (jA < strA.Length && strA[jA] == zeroA) jA++;
            while (jB < strB.Length && strB[jB] == zeroB) jB++;
            int resultIfSameLength = 0;
            do {
                isDigitA = jA < strA.Length && Char.IsDigit(strA[jA]);
                isDigitB = jB < strB.Length && Char.IsDigit(strB[jB]);
                int numA = isDigitA ? (int)Char.GetNumericValue(strA[jA]) : 0;
                int numB = isDigitB ? (int)Char.GetNumericValue(strB[jB]) : 0;
                if (isDigitA && (char)(strA[jA] - numA) != zeroA) isDigitA = false;
                if (isDigitB && (char)(strB[jB] - numB) != zeroB) isDigitB = false;
                if (isDigitA && isDigitB) {
                    if (numA != numB && resultIfSameLength == 0) {
                        resultIfSameLength = numA < numB ? -1 : 1;
                    }
                    jA++;
                    jB++;
                }
            }
            while (isDigitA && isDigitB);
            if (isDigitA != isDigitB) {
                // One number has more digits than the other (ignoring leading zeros) - the longer
                // number must be larger
                return isDigitA ? 1 : -1;
            }
            else if (resultIfSameLength != 0) {
                // Both numbers are the same length (ignoring leading zeros) and at least one of
                // the digits differed - the first difference determines the result
                return resultIfSameLength;
            }
            int lA = jA - iA;
            int lB = jB - iB;
            if (lA != lB) {
                // Both numbers are equivalent but one has more leading zeros
                return lA > lB ? -1 : 1;
            }
            else if (zeroA != zeroB && softResultWeight < 2) {
                softResult = cmp.Compare(strA, iA, 1, strB, iB, 1, options);
                softResultWeight = 2;
            }
            iA = jA;
            iB = jB;
        }
    }
    if (iA < strA.Length || iB < strB.Length) {
        return iA < strA.Length ? 1 : -1;
    }
    else if (softResult != 0) {
        return softResult;
    }
    return 0;
}

The signature matches the Comparison<string> delegate:

string[] files = Directory.GetFiles(@"C:\");
Array.Sort(files, CompareNatural);

Here's a wrapper class for use as IComparer<string>:

public class CustomComparer<T> : IComparer<T> {
    private Comparison<T> _comparison;

    public CustomComparer(Comparison<T> comparison) {
        _comparison = comparison;
    }

    public int Compare(T x, T y) {
        return _comparison(x, y);
    }
}

Example:

string[] files = Directory.EnumerateFiles(@"C:\")
    .OrderBy(f => f, new CustomComparer<string>(CompareNatural))
    .ToArray();

Here's a good set of filenames I use for testing:

Func<string, string> expand = (s) => { int o; while ((o = s.IndexOf('\\')) != -1) { int p = o + 1;
    int z = 1; while (s[p] == '0') { z++; p++; } int c = Int32.Parse(s.Substring(p, z));
    s = s.Substring(0, o) + new string(s[o - 1], c) + s.Substring(p + z); } return s; };
string encodedFileNames =
    "KDEqLW4xMiotbjEzKjAwMDFcMDY2KjAwMlwwMTcqMDA5XDAxNyowMlwwMTcqMDlcMDE3KjEhKjEtISox" +
    "LWEqMS4yNT8xLjI1KjEuNT8xLjUqMSoxXDAxNyoxXDAxOCoxXDAxOSoxXDA2NioxXDA2NyoxYSoyXDAx" +
    "NyoyXDAxOCo5XDAxNyo5XDAxOCo5XDA2Nio9MSphMDAxdGVzdDAxKmEwMDF0ZXN0aW5nYTBcMzEqYTAw" +
    "Mj9hMDAyIGE/YTAwMiBhKmEwMDIqYTAwMmE/YTAwMmEqYTAxdGVzdGluZ2EwMDEqYTAxdnNmcyphMSph" +
    "MWEqYTF6KmEyKmIwMDAzcTYqYjAwM3E0KmIwM3E1KmMtZSpjZCpjZipmIDEqZipnP2cgMT9oLW4qaG8t" +
    "bipJKmljZS1jcmVhbT9pY2VjcmVhbT9pY2VjcmVhbS0/ajBcNDE/ajAwMWE/ajAxP2shKmsnKmstKmsx" +
    "KmthKmxpc3QqbTAwMDNhMDA1YSptMDAzYTAwMDVhKm0wMDNhMDA1Km0wMDNhMDA1YSpuMTIqbjEzKm8t" +
    "bjAxMypvLW4xMipvLW40P28tbjQhP28tbjR6P28tbjlhLWI1Km8tbjlhYjUqb24wMTMqb24xMipvbjQ/" +
    "b240IT9vbjR6P29uOWEtYjUqb245YWI1Km/CrW4wMTMqb8KtbjEyKnAwMCpwMDEqcDAxwr0hKnAwMcK9" +
    "KnAwMcK9YSpwMDHCvcK+KnAwMipwMMK9KnEtbjAxMypxLW4xMipxbjAxMypxbjEyKnItMDAhKnItMDAh" +
    "NSpyLTAwIe+8lSpyLTAwYSpyLe+8kFwxIS01KnIt77yQXDEhLe+8lSpyLe+8kFwxISpyLe+8kFwxITUq" +
    "ci3vvJBcMSHvvJUqci3vvJBcMWEqci3vvJBcMyE1KnIwMCEqcjAwLTUqcjAwLjUqcjAwNSpyMDBhKnIw" +
    "NSpyMDYqcjQqcjUqctmg2aYqctmkKnLZpSpy27Dbtipy27Qqctu1KnLfgN+GKnLfhCpy34UqcuClpuCl" +
    "rCpy4KWqKnLgpasqcuCnpuCnrCpy4KeqKnLgp6sqcuCppuCprCpy4KmqKnLgqasqcuCrpuCrrCpy4Kuq" +
    "KnLgq6sqcuCtpuCtrCpy4K2qKnLgrasqcuCvpuCvrCpy4K+qKnLgr6sqcuCxpuCxrCpy4LGqKnLgsasq" +
    "cuCzpuCzrCpy4LOqKnLgs6sqcuC1puC1rCpy4LWqKnLgtasqcuC5kOC5lipy4LmUKnLguZUqcuC7kOC7" +
    "lipy4LuUKnLgu5UqcuC8oOC8pipy4LykKnLgvKUqcuGBgOGBhipy4YGEKnLhgYUqcuGCkOGClipy4YKU" +
    "KnLhgpUqcuGfoOGfpipy4Z+kKnLhn6UqcuGgkOGglipy4aCUKnLhoJUqcuGlhuGljCpy4aWKKnLhpYsq" +
    "cuGnkOGnlipy4aeUKnLhp5UqcuGtkOGtlipy4a2UKnLhrZUqcuGusOGutipy4a60KnLhrrUqcuGxgOGx" +
    "hipy4bGEKnLhsYUqcuGxkOGxlipy4bGUKnLhsZUqcuqYoFwx6pilKnLqmKDqmKUqcuqYoOqYpipy6pik" +
    "KnLqmKUqcuqjkOqjlipy6qOUKnLqo5UqcuqkgOqkhipy6qSEKnLqpIUqcuqpkOqplipy6qmUKnLqqZUq" +
    "cvCQkqAqcvCQkqUqcvCdn5gqcvCdn50qcu+8kFwxISpy77yQXDEt77yVKnLvvJBcMS7vvJUqcu+8kFwx" +
    "YSpy77yQXDHqmKUqcu+8kFwx77yO77yVKnLvvJBcMe+8lSpy77yQ77yVKnLvvJDvvJYqcu+8lCpy77yV" +
    "KnNpKnPEsSp0ZXN02aIqdGVzdNmi2aAqdGVzdNmjKnVBZS0qdWFlKnViZS0qdUJlKnVjZS0xw6kqdWNl" +
    "McOpLSp1Y2Uxw6kqdWPDqS0xZSp1Y8OpMWUtKnVjw6kxZSp3ZWlhMSp3ZWlhMip3ZWlzczEqd2Vpc3My" +
    "KndlaXoxKndlaXoyKndlacOfMSp3ZWnDnzIqeSBhMyp5IGE0KnknYTMqeSdhNCp5K2EzKnkrYTQqeS1h" +
    "Myp5LWE0KnlhMyp5YTQqej96IDA1MD96IDIxP3ohMjE/ejIwP3oyMj96YTIxP3rCqTIxP1sxKl8xKsKt" +
    "bjEyKsKtbjEzKsSwKg==";
string[] fileNames = Encoding.UTF8.GetString(Convert.FromBase64String(encodedFileNames))
    .Replace("*", ".txt?").Split(new[] { "?" }, StringSplitOptions.RemoveEmptyEntries)
    .Select(n => expand(n)).ToArray();
查看更多
浅入江南
6楼-- · 2018-12-31 01:33

I've actually implemented it as an extension method on the StringComparer so that you could do for example:

  • StringComparer.CurrentCulture.WithNaturalSort() or
  • StringComparer.OrdinalIgnoreCase.WithNaturalSort().

The resulting IComparer<string> can be used in all places like OrderBy, OrderByDescending, ThenBy, ThenByDescending, SortedSet<string>, etc. And you can still easily tweak case sensitivity, culture, etc.

The implementation is fairly trivial and it should perform quite well even on large sequences.


I've also published it as a tiny NuGet package, so you can just do:

Install-Package NaturalSort.Extension

The code including XML documentation comments and suite of tests is available in the NaturalSort.Extension GitHub repository.


The entire code is this (if you cannot use C# 7 yet, just install the NuGet package):

public static class StringComparerNaturalSortExtension
{
    public static IComparer<string> WithNaturalSort(this StringComparer stringComparer) => new NaturalSortComparer(stringComparer);

    private class NaturalSortComparer : IComparer<string>
    {
        public NaturalSortComparer(StringComparer stringComparer)
        {
            _stringComparer = stringComparer;
        }

        private readonly StringComparer _stringComparer;
        private static readonly Regex NumberSequenceRegex = new Regex(@"(\d+)", RegexOptions.Compiled | RegexOptions.CultureInvariant);
        private static string[] Tokenize(string s) => s == null ? new string[] { } : NumberSequenceRegex.Split(s);
        private static ulong ParseNumberOrZero(string s) => ulong.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out var result) ? result : 0;

        public int Compare(string s1, string s2)
        {
            var tokens1 = Tokenize(s1);
            var tokens2 = Tokenize(s2);

            var zipCompare = tokens1.Zip(tokens2, TokenCompare).FirstOrDefault(x => x != 0);
            if (zipCompare != 0)
                return zipCompare;

            var lengthCompare = tokens1.Length.CompareTo(tokens2.Length);
            return lengthCompare;
        }

        private int TokenCompare(string token1, string token2)
        {
            var number1 = ParseNumberOrZero(token1);
            var number2 = ParseNumberOrZero(token2);

            var numberCompare = number1.CompareTo(number2);
            if (numberCompare != 0)
                return numberCompare;

            var stringCompare = _stringComparer.Compare(token1, token2);
            return stringCompare;
        }
    }
}
查看更多
登录 后发表回答