I want to remove duplicates from list, without changing order of unique elements in the list.
Jon Skeet & others have suggested to use following
list = list.Distinct().ToList();
removing duplicates from a list C#
Remove duplicates from a List<T> in C#
Is it guaranteed that the order of unique elements would be same as before? If yes, please give a reference that confirms this as I couldn't find anything on it in documentation.
By default when use Distinct linq operator uses Equals method but you can use your own
IEqualityComparer<T>
object to specify when two objects are equals with a custom logic implementingGetHashCode
andEquals
method. Remember that:GetHashCode
should not used heavy cpu comparision ( eg. use only some obvious basic checks ) and its used as first to state if two object are surely different ( if different hash code are returned ) or potentially the same ( same hash code ). In this latest case when two object have the same hashcode the framework will step to check using the Equals method as a final decision about equality of given objects.After you have
MyType
and aMyTypeEqualityComparer
classes follow code not ensure the sequence maintain its order:In follow sci library I implemented an extension method to ensure Vector3D set maintain the order when use a specific extension method
DistinctKeepOrder
:relevant code follows:
In short
Vector3DWithOrder
encapsulate the type and an order integer, whileVector3DWithOrderEqualityComparer
encapsulates original type comparer.and this is the method helper to ensure order maintained
Note : further research could allow to find a more general ( uses of interfaces ) and optimized way ( without encapsulate the object ).
This highly depends on your linq-provider. On Linq2Objects you can stay on the internal source-code for
Distinct
, which makes one assume the original order is preserved.However for other providers that resolve to some kind of SQL for example, that isn´t neccessarily the case, as an
ORDER BY
-statement usually comes after any aggregation (such asDistinct
). So if your code is this:this is translated to something similar to the following in SQL:
This obviously first groups your data and sorts it afterwards. Now you´re stuck on the DBMS own logic of how to execute that. On some DBMS this isn´t even allowed. Imagine the following data:
when executing
myArr.OrderBy(x => x.anothercol).GroupBy(x => x.mycol)
we assume the following result:But the DBMS may aggregate the anothercol-column so, that allways the value of the first row is used, resulting in the following data:
which after ordering will result in this:
This is similar to the following:
which is the completely reverse order than what you expected.
You see the execution-plan may vary depending on what the underlying provider is. This is why there´s no guarantee about that in the docs.
It's not guaranteed, but it's the most obvious implementation. It would be hard to implement in a streaming manner (i.e. such that it returned results as soon as it could, having read as little as it could) without returning them in order.
You might want to read my blog post on the Edulinq implementation of Distinct().
Note that even if this were guaranteed for LINQ to Objects (which personally I think it should be) that wouldn't mean anything for other LINQ providers such as LINQ to SQL.
The level of guarantees provided within LINQ to Objects is a little inconsistent sometimes, IMO. Some optimizations are documented, others not. Heck, some of the documentation is flat out wrong.
Yes, Enumerable.Distinct preserves order. Assuming the method to be lazy "yields distinct values are soon as they are seen", it follows automatically. Think about it.
The .NET Reference source confirms. It returns a subsequence, the first element in each equivalence class.
The .NET Core implementation is similar.
Frustratingly, the documentation for Enumerable.Distinct is confused on this point:
I can only imagine they mean "the result sequence is not sorted." You could implement Distinct by presorting then comparing each element to the previous, but this would not be lazy as defined above.
Yes, in order of first occurrence in original list. It is guaranteed for .Net Framework 3.5
I did a little investigation with Reflector. After disassembling System.Core.dll, Version=3.5.0.0 you can see that Distinct() is an extension method, which looks like this:
So, interesting here is DistinctIterator, which implements IEnumerable and IEnumerator. Here is simplified (goto and lables removed) implementation of this IEnumerator:
As you can see - enumerating goes in order provided by source enumerable (list, on which we are callin Distinct). Hashset used only for determining whether we already returned such element or not. If not, we are returning it, else - continue enumerating on source.
So, it is guaranteed, that Distinct() will return elements exactly in same order, which are provided by collection to which Distinct was applied.
According to the documentation the sequence is unordered.