Case insensitive group on multiple columns

2019-06-15 21:30发布

问题:

Is there anyway to do a LINQ2SQL query doing something similar to this:

var result = source.GroupBy(a => new { a.Column1, a.Column2 });

or

var result = from s in source
             group s by new { s.Column1, s.Column2 } into c
             select new { Column1 = c.Key.Column1, Column2 = c.Key.Column2 };

but with ignoring the case of the contents of the grouped columns?

回答1:

You can pass StringComparer.InvariantCultureIgnoreCase to the GroupBy extension method.

var result = source.GroupBy(a => new { a.Column1, a.Column2 }, 
                StringComparer.InvariantCultureIgnoreCase);

Or you can use ToUpperInvariant on each field as suggested by Hamlet Hakobyan on comment. I recommend ToUpperInvariant or ToUpper rather than ToLower or ToLowerInvariant because it is optimized for programmatic comparison purpose.



回答2:

I couldn't get NaveenBhat's solution to work, getting a compile error:

The type arguments for method 'System.Linq.Enumerable.GroupBy(System.Collections.Generic.IEnumerable, System.Func, System.Collections.Generic.IEqualityComparer)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

To make it work, I found it easiest and clearest to define a new class to store my key columns (GroupKey), then a separate class that implements IEqualityComparer (KeyComparer). I can then call

var result= source.GroupBy(r => new GroupKey(r), new KeyComparer());

The KeyComparer class does compare the strings with the InvariantCultureIgnoreCase comparer, so kudos to NaveenBhat for pointing me in the right direction.

Simplified versions of my classes:

private class GroupKey
{
    public string Column1{ get; set; }
    public string Column2{ get; set; }

    public GroupKey(SourceObject r) {
        this.Column1 = r.Column1;
        this.Column2 = r.Column2;
    }
}

private class KeyComparer: IEqualityComparer<GroupKey>
{

    bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
    {
        if (!x.Column1.Equals(y.Column1,StringComparer.InvariantCultureIgnoreCase) return false;
        if (!x.Column2.Equals(y.Column2,StringComparer.InvariantCultureIgnoreCase) return false;
        return true;
        //my actual code is more complex than this, more columns to compare
        //and handles null strings, but you get the idea.
    }

    int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
    {
        return 0.GetHashCode() ; // forces calling Equals
        //Note, it would be more efficient to do something like
        //string hcode = Column1.ToLower() + Column2.ToLower();
        //return hcode.GetHashCode();
        //but my object is more complex than this simplified example

    }
}


回答3:

I had the same issue grouping by the values of DataRow objects from a Table, but I just used .ToString() on the DataRow object to get past the compiler issue, e.g.

MyTable.AsEnumerable().GroupBy(
    dataRow => dataRow["Value"].ToString(),
    StringComparer.InvariantCultureIgnoreCase)

instead of

MyTable.AsEnumerable().GroupBy(
    dataRow => dataRow["Value"],
    StringComparer.InvariantCultureIgnoreCase)


回答4:

I've expanded on Bill B's answer to make things a little more dynamic to avoid "hardcoding" the column properties in the GroupKey and IQualityComparer<>.

private class GroupKey
    {
        public List<string> Columns { get; } = new List<string>();

        public GroupKey(params string[] columns)
        {
            foreach (var column in columns)
            {
                // Using 'ToUpperInvariant()' if user calls Distinct() after 
                // the grouping, matching strings with a different case will 
                // be dropped and not duplicated
                Columns.Add(column.ToUpperInvariant());
            }
        }

    }

    private class KeyComparer : IEqualityComparer<GroupKey>
    {

        bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
        {
            for (var i = 0; i < x.Columns.Count; i++)
            {
                if (!x.Columns[i].Equals(y.Columns[i], StringComparison.OrdinalIgnoreCase)) return false;
            }

            return true;
        }

        int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
        {
            var hashcode = obj.Columns[0].GetHashCode();

            for (var i = 1; i < obj.Columns.Count; i++)
            {
                var column = obj.Columns[i];
                // *397 is normally generated by ReSharper to create more unique values
                // So I added it here, it's technically not required
                hashcode = (hashcode * 397) ^ (column != null ? column.GetHashCode() : 0);
            }

            return hashcode;
        }
    }

Usage:

var result = source.GroupBy(r => new GroupKey(r.Column1, r.Column2, r.Column3), new KeyComparer());

This way, you can pass any number of columns into the GroupKey constructor.