I'm pretty sure this is a duplicate, but I have tried everything, and I still cannot seem to get the differences. I have two lists of strings: listA and listB. I'm trying to find items in listA that are not in B.
Example: listA: "1", "2", "4", "7" listB: "2", "4" The output I want is: "1", "7"
Here is a for loop and lambda expression that I tried, but these take a really long time:
//these two approaches take too long for huge lists
foreach (var item in listA)
{
if (!listB.Contains(item))
diff.Add(id);
}
diff = listA.Where(id => !listB.Contains(id)).ToList();
//these don't give me the right differences
listA.Except(listB).ToList();
var set = new HashSet<string>(listA);
set.SymmetricExceptWith(listB);
should give the correct answer, but
should not. SymmetricExcept will give the items in listA not in listB plus the items in ListB not in ListA.
All code you posted should work fine so error is in another place anyway you write "these take a really long time" then I suppose you have a performance issue.
Let's do a very quick and dirty comparison (you know to do a good performance test is a long process, self-promotion: benchmark has been done with this free tool). Assumptions:
SymmetricExceptWith
and if not then its result is pretty different compared toExcept
). If it was a mistake just ignore tests forSymmetricExceptWith
.Two lists of 20,000 random items (test repeated 100 times then averaged, release mode).
Notes:
1 Loop with foreach
2 Loop with for
3 Hashset creation measured
4 Hashset creation not measured. I included this for reference but if you don't have first list as Hashset you can't ignore creation time.
We see
Contains()
method is pretty slow so we can drop it in bigger tests (anyway I checked and its performance won't become better or even comparable). Let's see what will happen for 1,000,000 items list.Let's try to make it parallel (please note that for this test I'm using a old Core 2 Duo 2 GHz):
Parallel performance are worse and LINQ Except is best option now. Let's see how it works on a better CPU (Xeon 2.8 GHz, quad core). Also note that with such big amount of data cache size won't affect testing too much.
To summarize: for relatively small lists
SymmetricExceptWith()
will perform better, for big listsExcept()
is always better. If you're targeting a modern multi-core CPU then parallel implementation will scale much better. In code:Please note that if you don't need
List<string>
as result andIEnumerable<string>
is enough then performance will greatly increase (and difference with parallel execution will be higher).Of course those two lines of code are not optimal and can be greatly increase (and if it's really performance critical you may pick
ParallelEnumerable.Except()
implementation as starting point for your own specific highly optimized routine).Use LINQ's
Except
method: