I have checked the whole site and googled on the net but was unable to find a simple solution to this problem.
I have a datatable which has about 20 columns and 10K rows. I need to remove the duplicate rows in this datatable based on 4 key columns. Doesn't .Net have a function which does this? The function closest to what I am looking for was datatable.DefaultView.ToTable(true, array of columns to display), But this function does a distinct on all the columns.
It would be great if someone could help me with this.
EDIT: I am sorry for not being clear on this. This datatable is being created by reading a CSV file and not from a DB. So using an SQL query is not an option.
I think this must be the best way to remove duplicates from Datatable by using
Linq
andmoreLinq
Code:Linq
Blog Article : Remove Duplicate rows records from DataTable Asp.net c#
MoreLinq
Note:
moreLinq
need to add library.In morelinq you can use function called DistinctBy in which you can specify the property on which you want to find Distinct objects.
Blog article : Using moreLinq DistinctBy method to remove duplicate records
Use a query instead of functions:
I wasn't keen on using the Linq solution above so I wrote this:
Additionally, this works on ALL columns rather than a specific column index:
How can I remove duplicate rows?. (Adjust the query there to join on your 4 key columns)
EDIT: with your new information I believe the easiest way would be to implement IEqualityComparer<T> and use Distinct on your data rows. Otherwise if you're working with IEnumerable/IList instead of DataTable/DataRow, it is certainly possible with some LINQ-to-objects kung-fu.
EDIT: example IEqualityComparer
You can use it like this:
Try this
Let us consider dtInput is your data table with duplicate records.
I have a new DataTable dtFinal in which I want to filter the duplicate rows.
So my code will be something like below.
Found this on bytes.com:
That would allow you to access your data via sql queries, as others proposed.