Comparing two collections for equality irrespectiv

2018-12-31 20:24发布

I would like to compare two collections (in C#), but I'm not sure of the best way to implement this efficiently.

I've read the other thread about Enumerable.SequenceEqual, but it's not exactly what I'm looking for.

In my case, two collections would be equal if they both contain the same items (no matter the order).

Example:

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1 == collection2; // true

What I usually do is to loop through each item of one collection and see if it exists in the other collection, then loop through each item of the other collection and see if it exists in the first collection. (I start by comparing the lengths).

if (collection1.Count != collection2.Count)
    return false; // the collections are not equal

foreach (Item item in collection1)
{
    if (!collection2.Contains(item))
        return false; // the collections are not equal
}

foreach (Item item in collection2)
{
    if (!collection1.Contains(item))
        return false; // the collections are not equal
}

return true; // the collections are equal

However, this is not entirely correct, and it's probably not the most efficient way to do compare two collections for equality.

An example I can think of that would be wrong is:

collection1 = {1, 2, 3, 3, 4}
collection2 = {1, 2, 2, 3, 4}

Which would be equal with my implementation. Should I just count the number of times each item is found and make sure the counts are equal in both collections?


The examples are in some sort of C# (let's call it pseudo-C#), but give your answer in whatever language you wish, it does not matter.

Note: I used integers in the examples for simplicity, but I want to be able to use reference-type objects too (they do not behave correctly as keys because only the reference of the object is compared, not the content).

18条回答
荒废的爱情
2楼-- · 2018-12-31 20:32

In many cases the only suitable answer is the one of Igor Ostrovsky , other answers are based on objects hash code. But when you generate an hash code for an object you do so only based on his IMMUTABLE fields - such as object Id field (in case of a database entity) - Why is it important to override GetHashCode when Equals method is overridden?

This means , that if you compare two collections , the result might be true of the compare method even though the fields of the different items are non-equal . To deep compare collections , you need to use Igor's method and implement IEqualirity .

Please read the comments of me and mr.Schnider's on his most voted post.

James

查看更多
有味是清欢
3楼-- · 2018-12-31 20:34

A duplicate post of sorts, but check out my solution for comparing collections. It's pretty simple:

This will perform an equality comparison regardless of order:

var list1 = new[] { "Bill", "Bob", "Sally" };
var list2 = new[] { "Bob", "Bill", "Sally" };
bool isequal = list1.Compare(list2).IsSame;

This will check to see if items were added / removed:

var list1 = new[] { "Billy", "Bob" };
var list2 = new[] { "Bob", "Sally" };
var diff = list1.Compare(list2);
var onlyinlist1 = diff.Removed; //Billy
var onlyinlist2 = diff.Added;   //Sally
var inbothlists = diff.Equal;   //Bob

This will see what items in the dictionary changed:

var original = new Dictionary<int, string>() { { 1, "a" }, { 2, "b" } };
var changed = new Dictionary<int, string>() { { 1, "aaa" }, { 2, "b" } };
var diff = original.Compare(changed, (x, y) => x.Value == y.Value, (x, y) => x.Value == y.Value);
foreach (var item in diff.Different)
  Console.Write("{0} changed to {1}", item.Key.Value, item.Value.Value);
//Will output: a changed to aaa

Original post here.

查看更多
低头抚发
4楼-- · 2018-12-31 20:35

Create a Dictionary "dict" and then for each member in the first collection, do dict[member]++;

Then, loop over the second collection in the same way, but for each member do dict[member]--.

At the end, loop over all of the members in the dictionary:

    private bool SetEqual (List<int> left, List<int> right) {

        if (left.Count != right.Count)
            return false;

        Dictionary<int, int> dict = new Dictionary<int, int>();

        foreach (int member in left) {
            if (dict.ContainsKey(member) == false)
                dict[member] = 1;
            else
                dict[member]++;
        }

        foreach (int member in right) {
            if (dict.ContainsKey(member) == false)
                return false;
            else
                dict[member]--;
        }

        foreach (KeyValuePair<int, int> kvp in dict) {
            if (kvp.Value != 0)
                return false;
        }

        return true;

    }

Edit: As far as I can tell this is on the same order as the most efficient algorithm. This algorithm is O(N), assuming that the Dictionary uses O(1) lookups.

查看更多
只靠听说
5楼-- · 2018-12-31 20:35

If you use Shouldly, you can use ShouldAllBe with Contains.

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1.ShouldAllBe(item=>collection2.Contains(item)); // true

And finally, you can write an extension.

public static class ShouldlyIEnumerableExtensions
{
    public static void ShouldEquivalentTo<T>(this IEnumerable<T> list, IEnumerable<T> equivalent)
    {
        list.ShouldAllBe(l => equivalent.Contains(l));
    }
}

UPDATE

A optional parameter exists on ShouldBe method.

collection1.ShouldBe(collection2, ignoreOrder: true); // true
查看更多
唯独是你
6楼-- · 2018-12-31 20:37

There are many solutions to this problem. If you don't care about duplicates, you don't have to sort both. First make sure that they have the same number of items. After that sort one of the collections. Then binsearch each item from the second collection in the sorted collection. If you don't find a given item stop and return false. The complexity of this: - sorting the first collection: NLog(N) - searching each item from second into the first: NLOG(N) so you end up with 2*N*LOG(N) assuming that they match and you look up everything. This is similar to the complexity of sorting both. Also this gives you the benefit to stop earlier if there's a difference. However, keep in mind that if both are sorted before you step into this comparison and you try sorting by use something like a qsort, the sorting will be more expensive. There are optimizations for this. Another alternative, which is great for small collections where you know the range of the elements is to use a bitmask index. This will give you a O(n) performance. Another alternative is to use a hash and look it up. For small collections it is usually a lot better to do the sorting or the bitmask index. Hashtable have the disadvantage of worse locality so keep that in mind. Again, that's only if you don't care about duplicates. If you want to account for duplicates go with sorting both.

查看更多
孤独总比滥情好
7楼-- · 2018-12-31 20:38

It turns out Microsoft already has this covered in its testing framework: CollectionAssert.AreEquivalent

Remarks

Two collections are equivalent if they have the same elements in the same quantity, but in any order. Elements are equal if their values are equal, not if they refer to the same object.

Using reflector, I modified the code behind AreEquivalent() to create a corresponding equality comparer. It is more complete than existing answers, since it takes nulls into account, implements IEqualityComparer and has some efficiency and edge case checks. plus, it's Microsoft :)

public class MultiSetComparer<T> : IEqualityComparer<IEnumerable<T>>
{
    private readonly IEqualityComparer<T> m_comparer;
    public MultiSetComparer(IEqualityComparer<T> comparer = null)
    {
        m_comparer = comparer ?? EqualityComparer<T>.Default;
    }

    public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
    {
        if (first == null)
            return second == null;

        if (second == null)
            return false;

        if (ReferenceEquals(first, second))
            return true;

        if (first is ICollection<T> firstCollection && second is ICollection<T> secondCollection)
        {
            if (firstCollection.Count != secondCollection.Count)
                return false;

            if (firstCollection.Count == 0)
                return true;
        }

        return !HaveMismatchedElement(first, second);
    }

    private bool HaveMismatchedElement(IEnumerable<T> first, IEnumerable<T> second)
    {
        int firstNullCount;
        int secondNullCount;

        var firstElementCounts = GetElementCounts(first, out firstNullCount);
        var secondElementCounts = GetElementCounts(second, out secondNullCount);

        if (firstNullCount != secondNullCount || firstElementCounts.Count != secondElementCounts.Count)
            return true;

        foreach (var kvp in firstElementCounts)
        {
            var firstElementCount = kvp.Value;
            int secondElementCount;
            secondElementCounts.TryGetValue(kvp.Key, out secondElementCount);

            if (firstElementCount != secondElementCount)
                return true;
        }

        return false;
    }

    private Dictionary<T, int> GetElementCounts(IEnumerable<T> enumerable, out int nullCount)
    {
        var dictionary = new Dictionary<T, int>(m_comparer);
        nullCount = 0;

        foreach (T element in enumerable)
        {
            if (element == null)
            {
                nullCount++;
            }
            else
            {
                int num;
                dictionary.TryGetValue(element, out num);
                num++;
                dictionary[element] = num;
            }
        }

        return dictionary;
    }

    public int GetHashCode(IEnumerable<T> enumerable)
    {
        if (enumerable == null) throw new ArgumentNullException(nameof(enumerable));

        int hash = 17;

        foreach (T val in enumerable.OrderBy(x => x))
            hash = hash * 23 + (val?.GetHashCode() ?? 42);

        return hash;
    }
}

Sample usage:

var set = new HashSet<IEnumerable<int>>(new[] {new[]{1,2,3}}, new MultiSetComparer<int>());
Console.WriteLine(set.Contains(new [] {3,2,1})); //true
Console.WriteLine(set.Contains(new [] {1, 2, 3, 3})); //false

Or if you just want to compare two collections directly:

var comp = new MultiSetComparer<string>();
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","c","b"})); //true
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","b"})); //false

Finally, you can use your an equality comparer of your choice:

var strcomp = new MultiSetComparer<string>(StringComparer.OrdinalIgnoreCase);
Console.WriteLine(strcomp.Equals(new[] {"a", "b"}, new []{"B", "A"})); //true
查看更多
登录 后发表回答