I am trying to load a list of distinct colors from previously loaded list of products on a page. So to pull in the products I do this:
var products = Products
.Include(p => p.ProductColor)
.ToList();
Then I do some processing on the products them I want to get a list of all of the distinct colors used by the products, so I do this:
var colors = products
.Select(p => p.ProductColor)
.Distinct();
And this works great, however if I add a call to .AsNoTracking()
to the original products call, I now get an entry in my color list for each entry in the product list.
Why is there a difference in these two? Is there a way to keep Entity Framework from tracking the objects (they're being used for read only) and to get the desired behavior?
Here is my query after adding the call to AsNoTracking()
var products = Products
.AsNoTracking()
.Include(p => p.ProductColor)
.ToList();
AsNoTracking
"breaks" Distinct
because AsNoTracking
"breaks" identity mapping. Since entities loaded with AsNoTracking()
won't get attached to the context cache EF materializes new entities for every row returned from the query whereas when tracking is enabled it would check if an entity with the same key value does already exist in the context and if yes, it wouldn't create a new object and just use the attached object instance instead.
For example, if you have 2 products and both are Green:
Without AsNoTracking()
your query will materialize 3 objects: 2 Product
objects and 1 ProductColor
object (Green). Product 1 has a reference to Green (in ProductColor
property) and Product 2 has a reference to the same object instance Green, i.e.
object.ReferenceEquals(product1.ProductColor, product2.ProductColor) == true
With AsNoTracking()
your query will materialize 4 objects: 2 product objects and 2 color objects (both represent Green and have the same key value). Product 1 has a reference to Green (in ProductColor
property) and Product 2 has a reference to Green but this is another object instance, i.e.
object.ReferenceEquals(product1.ProductColor, product2.ProductColor) == false
Now, if you call Distinct()
on a collection in memory (LINQ-to-Objects) the default comparison for Distinct()
without parameter is comparing object reference identities. So, in case 1 you get only 1 Green object, but in case 2 you'll get 2 Green objects.
To get the desired result after you have run the query with AsNoTracking()
you need a comparison by the entity key. You can either use the second overload of Distinct
which takes an IEqualityComparer
as parameter. An example for its implementation is here and you would use the key property of ProductColor
to compare two objects.
Or - which seems easier to me than the tedious IEqualityComparer
implementation - you rewrite the Distinct()
using a GroupBy
(with the ProductColor
key property as the grouping key):
var colors = products
.Select(p => p.ProductColor)
.GroupBy(pc => pc.ProductColorId)
.Select(g => g.First());
The First()
basically means that you are throwing all duplicates away and just keep the first object instance per key value.