I'm having a problem with my work that hopefully reduces to the following: I have two List<int>
s, and I want to see if any of the int
s in ListA
are equal to any int
in ListB
. (They can be arrays if that makes life easier, but I think List<>
has some built-in magic that might help.) And I'm sure this is a LINQ-friendly problem, but I'm working in 2.0 here.
My solution so far has been to foreach
through ListA, and then foreach
through ListB,
foreach (int a in ListA)
{
foreach (int b in ListB)
{
if (a == b)
{
return true;
}
}
}
which was actually pretty slick when they were each three items long, but now they're 200 long and they frequently don't match, so we get the worst-case of N^2 comparisons. Even 40,000 comparisons go by pretty fast, but I think I might be missing something, since N^2 seems pretty naive for this particular problem.
Thanks!
Instead of iterating through each list, take a look at the List.Contains method:
How about using BinarySearch method instead of iterating over all elements in the inner loop?
Chris gives an O(N) solution by hashing. Now, depending on the constant factor (due to hashing), it might be worth considering an O(N log(N)) solution by sorting. There are a few different variants that you may consider depending on your use case.
Sort ListB ( O(N log(N) ), and use a searching algorithm to parse each element in ListA (which is again O(N) * O(log(N))).
Sort both ListA and ListB ( O(N log(N) ), and use an O(N) algorithm to compare these lists for duplicates.
If both lists are going to be used more than once, the second method is preferred.
With LINQ, this is trivial, as you can call the
Intersect
extension method on theEnumerable
class to give you the set intersection of the two arrays:However, this is the set intersection, meaning if
ListA
andListB
don't have unique values in it, you won't get any copies. In other words if you have the following:Then
ListA.Intersect(ListB)
produces:If you're expecting:
Then you're going to have to maintain a count of the items yourself and yield/decrement as you scan the two lists.
First, you'd want to collect a
Dictionary<TKey, int>
with the lists of the individual items:From there, you can scan
ListB
and place that in a list when you come across an item incountsOfA
:You can wrap this up in an extension method that defers execution like so:
Note that both of these approaches are (and I apologize if I'm butchering Big-O notation here)
O(N + M)
whereN
is the number of items in the first array, andM
is the number of items in the second array. You have to scan each list only once, and it's assumed that getting the hash codes and performing lookups on the hash codes is anO(1)
(constant) operation.Load the whole of ListA into a HashSet instance, and then test foreach item in ListB against the HastSet: I'm pretty sure that this would be O(N).
Here's the same thing in one line:
HashSet doesn't exist in .NET 3.5, so in .NET 2.0 you can use
Dictionary<int,object>
(instead of usingHashSet<int>
), and always store null as the object/value in the Dictionary since you're only interested in the keys.