I have seen the following solution used for removing duplicate rows in a DataTable
using LINQ
:
Say if we have a DataTable
with duplicate rows called dt
, then the following statemend apparently does the job:
IEnumerable<DataRow> uniqueContacts = dt.AsEnumerable().Distinct(DataRowComparer.Default);
But this is only removing duplicate rows that are identical. What I want to do is to remove all the rows that have duplicate values of a specific row.
E.g. If we have a datatable with a row called "Email", how can we remove all the rows that have the same email value?
simple way: use GroupBy
:
var uniqueContacts = dt.AsEnumerable()
.GroupBy(x=>x.Field<string>("Email"))
.Select(g=>g.First());
you can also try use Distinct
:
DataTable dt = new DataTable();
dt.Columns.Add("ID");
dt.Columns.Add("FirstName");
dt.Columns.Add("Email");
dt.Rows.Add(1,"Tim","tim@mail.com");
dt.Rows.Add(2,"Tim1","tim@mail.com");
dt.Rows.Add(3,"Tim2","tim2@mail.com");
dt.Rows.Add(4,"Tim3","tim3@mail.com");
dt.AsEnumerable().Distinct(new DataRowComparer()).Dump();
Custom row comparer:
public class DataRowComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow t1, DataRow t2)
{
return (t1.Field<string>("Email")==t2.Field<string>("Email"));
}
public int GetHashCode(DataRow t)
{
return t.ToString().GetHashCode();
}
}
http://msdn.microsoft.com/en-us/library/bb338049.aspx