Sorting mixed numbers and strings

2019-02-04 14:13发布

问题:

I have a list of strings that can contain a letter or a string representation of an int (max 2 digits). They need to be sorted either alphabetically or (when it is actually an int) on the numerical value it represents.

Example:

IList<string> input = new List<string>()
    {"a", 1.ToString(), 2.ToString(), "b", 10.ToString()};

input.OrderBy(s=>s)
  // 1
  // 10
  // 2
  // a
  // b

What I would want is

  // 1
  // 2
  // 10
  // a
  // b

I have some idea involving formatting it with trying to parse it, then if it is a successfull tryparse to format it with my own custom stringformatter to make it have preceding zeros. I'm hoping for something more simple and performant.

Edit
I ended up making an IComparer I dumped in my Utils library for later use.
While I was at it I threw doubles in the mix too.

public class MixedNumbersAndStringsComparer : IComparer<string> {
    public int Compare(string x, string y) {
        double xVal, yVal;

        if(double.TryParse(x, out xVal) && double.TryParse(y, out yVal))
            return xVal.CompareTo(yVal);
        else 
            return string.Compare(x, y);
    }
}

//Tested on int vs int, double vs double, int vs double, string vs int, string vs doubl, string vs string.
//Not gonna put those here
[TestMethod]
public void RealWorldTest()
{
    List<string> input = new List<string>() { "a", "1", "2,0", "b", "10" };
    List<string> expected = new List<string>() { "1", "2,0", "10", "a", "b" };
    input.Sort(new MixedNumbersAndStringsComparer());
    CollectionAssert.AreEquivalent(expected, input);
}

回答1:

Perhaps you could go with a more generic approach and use a natural sorting algorithm such as the C# implementation here.



回答2:

Two ways come to mind, not sure which is more performant. Implement a custom IComparer:

class MyComparer : IComparer<string>
{
    public int Compare(string x, string y)
    {
        int xVal, yVal;
        var xIsVal = int.TryParse( x, out xVal );
        var yIsVal = int.TryParse( y, out yVal );

        if (xIsVal && yIsVal)   // both are numbers...
            return xVal.CompareTo(yVal);
        if (!xIsVal && !yIsVal) // both are strings...
            return x.CompareTo(y);
        if (xIsVal)             // x is a number, sort first
            return -1;
        return 1;               // x is a string, sort last
    }
}

var input = new[] {"a", "1", "10", "b", "2", "c"};
var e = input.OrderBy( s => s, new MyComparer() );

Or, split the sequence into numbers and non-numbers, then sort each subgroup, finally join the sorted results; something like:

var input = new[] {"a", "1", "10", "b", "2", "c"};

var result = input.Where( s => s.All( x => char.IsDigit( x ) ) )
                  .OrderBy( r => { int z; int.TryParse( r, out z ); return z; } )
                  .Union( input.Where( m => m.Any( x => !char.IsDigit( x ) ) )
                               .OrderBy( q => q ) );


回答3:

Use the other overload of OrderBy that takes an IComparer parameter.

You can then implement your own IComparer that uses int.TryParse to tell if it's a number or not.



回答4:

I'd say you could split up the values using a RegularExpression (assuming everything is an int) and then rejoin them together.

//create two lists to start
string[] data = //whatever...
List<int> numbers = new List<int>();
List<string> words = new List<string>();

//check each value
foreach (string item in data) {
    if (Regex.IsMatch("^\d+$", item)) {
        numbers.Add(int.Parse(item));
    }
    else {
        words.Add(item);
    }
}

Then with your two lists you can sort each of them and then merge them back together in whatever format you want.



回答5:

You could just use function provided by the Win32 API:

[DllImport ("shlwapi.dll", CharSet=CharSet.Unicode, ExactSpelling=true)]
static extern int StrCmpLogicalW (String x, String y);

and call it from an IComparer as others have shown.



回答6:

public static int? TryParse(string s)
{
    int i;
    return int.TryParse(s, out i) ? (int?)i : null;
}

// in your method
IEnumerable<string> input = new string[] {"a", "1","2", "b", "10"};
var list = input.Select(s => new { IntVal = TryParse(s), String =s}).ToList();
list.Sort((s1, s2) => {
    if(s1.IntVal == null && s2.IntVal == null)
    {
        return s1.String.CompareTo(s2.String);
    }
    if(s1.IntVal == null)
    {
        return 1;
    }
    if(s2.IntVal == null)
    {
        return -1;
    }
    return s1.IntVal.Value.CompareTo(s2.IntVal.Value);
});
input = list.Select(s => s.String);

foreach(var x in input)
{
    Console.WriteLine(x);
}

It still does the conversion, but only once/item.



回答7:

You could use a custom comparer - the ordering statement would then be:

var result = input.OrderBy(s => s, new MyComparer());

where MyComparer is defined like this:

public class MyComparer : Comparer<string>
{
    public override int Compare(string x, string y)
    {

        int xNumber;
        int yNumber;
        var xIsNumber = int.TryParse(x, out xNumber);
        var yIsNumber = int.TryParse(y, out yNumber);

        if (xIsNumber && yIsNumber)
        {
            return xNumber.CompareTo(yNumber);
        }
        if (xIsNumber)
        {
            return -1;
        }
        if (yIsNumber)
        {
            return 1;
        }
        return x.CompareTo(y);
    }
}

Although this may seem a bit verbose, it encapsulates the sorting logic into a proper type. You can then, if you wish, easily subject the Comparer to automated testing (unit testing). It is also reusable.

(It may be possible to make the algorithm a bit clearer, but this was the best I could quickly throw together.)



回答8:

You could also "cheat" in some sense. Based on your description of the problem, You know any String of length 2 will be a number. So just sort all the Strings of length 1. And then sort all the Strings of length 2. And then do a bunch of swapping to re-order your Strings in the correct order. Essentially the process will work as follows: (assuming your data is in an array.)

Step 1: Push all Strings of length 2 to the end of the array. Keeping track of how many you have.

Step 2: In place sort the Strings of length 1 and Strings of length 2.

Step 3: Binary search for 'a' which would be on the boundary of your two halves.

Step 4: Swap your two digit Strings with the letters as necessary.

That said, while this approach will work, does not involve regular expressions, and does not attempt to parse non-int values as an int -- I don't recommend it. You'll be writing significantly more code than other approaches already suggested. It obfuscates the point of what you are trying to do. It doesn't work if you suddenly get two letter Strings or three digit Strings. Etc. I'm just including it to show how you can look at problems differently, and come up with alternative solutions.



回答9:

Use a Schwartzian Transform to perform O(n) conversions!

private class Normalized : IComparable<Normalized> {
  private readonly string str;
  private readonly int val;

  public Normalized(string s) {
    str = s;

    val = 0;
    foreach (char c in s) {
      val *= 10;

      if (c >= '0' && c <= '9')
        val += c - '0';
      else
        val += 100 + c;
    }
  }

  public String Value { get { return str; } }

  public int CompareTo(Normalized n) { return val.CompareTo(n.val); }
};

private static Normalized In(string s) { return new Normalized(s); }
private static String Out(Normalized n) { return n.Value; }

public static IList<String> MixedSort(List<String> l) {
  var tmp = l.ConvertAll(new Converter<String,Normalized>(In));
  tmp.Sort();
  return tmp.ConvertAll(new Converter<Normalized,String>(Out));
}


回答10:

I had a similar problem and landed here: sorting strings that have a numeric suffix as in the following example.

Original:

"Test2", "Test1", "Test10", "Test3", "Test20"

Default sort result:

"Test1", "Test10", "Test2", "Test20", "Test3"

Desired sort result:

"Test1", "Test2", "Test3, "Test10", "Test20"

I ended up using a custom Comparer:

public class NaturalComparer : IComparer
{

    public NaturalComparer()
    {
        _regex = new Regex("\\d+$", RegexOptions.IgnoreCase);
    }

    private Regex _regex;

    private string matchEvaluator(System.Text.RegularExpressions.Match m)
    {
        return Convert.ToInt32(m.Value).ToString("D10");
    }

    public int Compare(object x, object y)
    {
        x = _regex.Replace(x.ToString, matchEvaluator);
        y = _regex.Replace(y.ToString, matchEvaluator);

        return x.CompareTo(y);
    }
}   

HTH ;o)