LINQ to remove duplicate rows from a datatable bas

2020-07-27 04:52发布

问题:

I have seen the following solution used for removing duplicate rows in a DataTable using LINQ:

Say if we have a DataTable with duplicate rows called dt, then the following statemend apparently does the job:

IEnumerable<DataRow> uniqueContacts = dt.AsEnumerable().Distinct(DataRowComparer.Default);

But this is only removing duplicate rows that are identical. What I want to do is to remove all the rows that have duplicate values of a specific row.

E.g. If we have a datatable with a row called "Email", how can we remove all the rows that have the same email value?

回答1:

simple way: use GroupBy:

var uniqueContacts = dt.AsEnumerable()
                       .GroupBy(x=>x.Field<string>("Email"))
                       .Select(g=>g.First());

you can also try use Distinct:

    DataTable dt = new DataTable();  
    dt.Columns.Add("ID");  
    dt.Columns.Add("FirstName");  
    dt.Columns.Add("Email");  
    dt.Rows.Add(1,"Tim","tim@mail.com");
    dt.Rows.Add(2,"Tim1","tim@mail.com");
    dt.Rows.Add(3,"Tim2","tim2@mail.com");
    dt.Rows.Add(4,"Tim3","tim3@mail.com");

    dt.AsEnumerable().Distinct(new DataRowComparer()).Dump();

Custom row comparer:

 public class DataRowComparer : IEqualityComparer<DataRow>  
    {  
        public bool Equals(DataRow t1, DataRow t2)  
        {  
            return (t1.Field<string>("Email")==t2.Field<string>("Email"));  
        }  
        public int GetHashCode(DataRow t)  
        {  
            return t.ToString().GetHashCode();  
        }  
    }

http://msdn.microsoft.com/en-us/library/bb338049.aspx