Test whether two IEnumerable<T> have the same valu

I have two multisets, both IEnumerables, and I want to compare them.

string[] names1 = { "tom", "dick", "harry" };
string[] names2 = { "tom", "dick", "harry", "harry"};
string[] names3 = { "tom", "dick", "harry", "sally" };
string[] names4 = { "dick", "harry", "tom" };

Want names1 == names4 to return true (and self == self returns true obviously)
But all other combos return false.

What is the most efficient way? These can be large sets of complex objects.

I looked at doing:
var a = name1.orderby<MyCustomType, string>(v => v.Name);
var b = name4.orderby<MyCustomType, string>(v => v.Name);

return a == b;

标签： c# linq set

4条回答

神经病院院长

2楼-- · 2019-02-08 13:45

The most efficient way would depend on the datatypes. A reasonably efficient O(N) solution that's very short is the following:

var list1Groups=list1.ToLookup(i=>i);
var list2Groups=list2.ToLookup(i=>i);
return list1Groups.Count == list2Groups.Count 
   && list1Groups.All(g => g.Count() == list2Groups[g.Key].Count());

The items are required to have a valid Equals and GetHashcode implementation.

If you want a faster solution, cdhowie's solution below is comparably fast @ 10000 elements, and pulls ahead by a factor 5 for large collections of simple objects - probably due to better memory efficiency.

Finally, if you're really interested in performance, I'd definitely try the Sort-then-SequenceEqual approach. Although it has worse complexity, that's just a log N factor, and those can definitely be drowned out by differences in the constant for all practical data set sizes - and you might be able to sort in-place, use arrays or even incrementally sort (which can be linear). Even at 4 billion elements, the log-base-2 is just 32; that's a relevant performance difference, but the difference in constant factor could conceivably be larger. For example, if you're dealing with arrays of ints and don't mind modifying the collection order, the following is faster than either option even for 10000000 items (twice that and I get an OutOfMemory on 32-bit):

Array.Sort(list1);
Array.Sort(list2);
return list1.SequenceEqual(list2);

YMMV depending on machine, data-type, lunar cycle, and the other usual factors influencing microbenchmarks.

0人赞添加讨论(0) 举报

【Aperson】

3楼-- · 2019-02-08 13:48

@cdhowie's answer is great, but here's a nice trick that makes it even better for types that declare .Count by comparing that value prior to decomposing parameters to IEnumerable. Just add this to your code in addition to his solution:

    public static bool UnsortedSequencesEqual<T>(this IReadOnlyList<T> first, IReadOnlyList<T> second, IEqualityComparer<T> comparer = null)
    {
        if (first.Count != second.Count)
        {
            return false;
        }

        return UnsortedSequencesEqual((IEnumerable<T>)first, (IEnumerable<T>)second, comparer);
    }

0人赞添加讨论(0) 举报

ゆ、 Hurt°

4楼-- · 2019-02-08 14:01

You could use a binary search tree to ensure that the data is sorted. That would make it an O(log N) operation. Then you can run through each tree one item at a time and break as soon as you find a not equal to condition. This would also give you the added benefit of being able to first compare the size of the two trees since duplicates would be filtered out. I'm assuming these are treated as sets, whereby {"harry", "harry"} == {"harry").

If you are counting duplicates, then do a quicksort or a mergesort first, that would then make your comparison operation an O(N) operation. You could of course compare the size first, as two enums cannot be equal if the sizes are different. Since the data is sorted, the first non-equal condition you encounter would render the entire operation as "not-equal".

0人赞添加讨论(0) 举报

三岁会撩人

5楼-- · 2019-02-08 14:03

First sort as you have already done, and then use Enumerable.SequenceEqual. You can use the first overload if your type implements IEquatable<MyCustomType> or overrides Equals; otherwise you will have to use the second form and provide your own IEqualityComparer<MyCustomType>.

So if your type does implement equality, just do:

return a.SequenceEqual(b);

Here's another option that is both faster, safer, and requires no sorting:

public static bool UnsortedSequencesEqual<T>(
    this IEnumerable<T> first,
    IEnumerable<T> second)
{
    return UnsortedSequencesEqual(first, second, null);
}

public static bool UnsortedSequencesEqual<T>(
    this IEnumerable<T> first,
    IEnumerable<T> second,
    IEqualityComparer<T> comparer)
{
    if (first == null)
        throw new ArgumentNullException("first");

    if (second == null)
        throw new ArgumentNullException("second");

    var counts = new Dictionary<T, int>(comparer);

    foreach (var i in first) {
        int c;
        if (counts.TryGetValue(i, out c))
            counts[i] = c + 1;
        else
            counts[i] = 1;
    }

    foreach (var i in second) {
        int c;
        if (!counts.TryGetValue(i, out c))
            return false;

        if (c == 1)
            counts.Remove(i);
        else
            counts[i] = c - 1;
    }

    return counts.Count == 0;
}

0人赞添加讨论(0) 举报

Test whether two IEnumerable have the same valu

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间