I would like to compare two collections (in C#), but I'm not sure of the best way to implement this efficiently.
I've read the other thread about Enumerable.SequenceEqual, but it's not exactly what I'm looking for.
In my case, two collections would be equal if they both contain the same items (no matter the order).
Example:
collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};
collection1 == collection2; // true
What I usually do is to loop through each item of one collection and see if it exists in the other collection, then loop through each item of the other collection and see if it exists in the first collection. (I start by comparing the lengths).
if (collection1.Count != collection2.Count)
return false; // the collections are not equal
foreach (Item item in collection1)
{
if (!collection2.Contains(item))
return false; // the collections are not equal
}
foreach (Item item in collection2)
{
if (!collection1.Contains(item))
return false; // the collections are not equal
}
return true; // the collections are equal
However, this is not entirely correct, and it's probably not the most efficient way to do compare two collections for equality.
An example I can think of that would be wrong is:
collection1 = {1, 2, 3, 3, 4}
collection2 = {1, 2, 2, 3, 4}
Which would be equal with my implementation. Should I just count the number of times each item is found and make sure the counts are equal in both collections?
The examples are in some sort of C# (let's call it pseudo-C#), but give your answer in whatever language you wish, it does not matter.
Note: I used integers in the examples for simplicity, but I want to be able to use reference-type objects too (they do not behave correctly as keys because only the reference of the object is compared, not the content).
Allowing for duplicates in the
IEnumerable<T>
(if sets are not desirable\possible) and "ignoring order" you should be able to use a.GroupBy()
.I'm not an expert on the complexity measurements, but my rudimentary understanding is that this should be O(n). I understand O(n^2) as coming from performing an O(n) operation inside another O(n) operation like
ListA.Where(a => ListB.Contains(a)).ToList()
. Every item in ListB is evaluated for equality against each item in ListA.Like I said, my understanding on complexity is limited, so correct me on this if I'm wrong.
Why not use .Except()
http://msdn.microsoft.com/en-us/library/bb397894.aspx
This is my (heavily influenced by D.Jennings) generic implementation of the comparison method (in C#):
EDIT: I realized as soon as I posed that this really only works for sets -- it will not properly deal with collections that have duplicate items. For example { 1, 1, 2 } and { 2, 2, 1 } will be considered equal from this algorithm's perspective. If your collections are sets (or their equality can be measured that way), however, I hope you find the below useful.
The solution I use is:
Linq does the dictionary thing under the covers, so this is also O(N). (Note, it's O(1) if the collections aren't the same size).
I did a sanity check using the "SetEqual" method suggested by Daniel, the OrderBy/SequenceEquals method suggested by Igor, and my suggestion. The results are below, showing O(N*LogN) for Igor and O(N) for mine and Daniel's.
I think the simplicity of the Linq intersect code makes it the preferable solution.
Solution requires .NET 3.5 and the
System.Collections.Generic
namespace. According to Microsoft,SymmetricExceptWith
is an O(n + m) operation, with n representing the number of elements in the first set and m representing the number of elements in the second. You could always add an equality comparer to this function if necessary.A simple and fairly efficient solution is to sort both collections and then compare them for equality:
This algorithm is O(N*logN), while your solution above is O(N^2).
If the collections have certain properties, you may be able to implement a faster solution. For example, if both of your collections are hash sets, they cannot contain duplicates. Also, checking whether a hash set contains some element is very fast. In that case an algorithm similar to yours would likely be fastest.