Delete duplicates using Lambda

2019-02-26 21:04发布

问题:

I need some help using a lambda expression to remove duplicate entries in my Entity Framework context. I have a table with the following columns:

Id, DateOfIncident, Description, EmployeeId, IncidentTypeId, and IsAttendanceIncident

I want to remove duplicate entries where DateOfIncident, EmployeeID, IncidentTypeID and IsAttendanceIncident are the same. I do want to keep one entry. I know how to do this using a Stored Procedure with CTE in SQL, but I cannot figure out how to accomplish this task using a Lambda expression.

This code returns a list excluding my duplicates, but now how do I go about removing those duplicates that are not in this list ?

  var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y").GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
         .Select(x => x.FirstOrDefault());

UPDATE:

So I went ahead and wrote the custom IEqualityComparer. Now how do I remove the incidents in my context that are not in my distinctItems using the id ?

 static void Main(string[] args)
    {
        DALIncidents.AttendanceEntities1 db = new DALIncidents.AttendanceEntities1();

       IEnumerable<DALIncidents.Incident> distinctItems = db.Incidents.Where(c => c.IsAttendanceIncident == "Y");
       distinctItems = distinctItems.Distinct(new DALIncidents.DistinctIncidentComparer());

       Console.ReadLine(); 
    }

回答1:

var query = db.Incidents
    .Where(x => x.IsAttendanceIncident == "Y")
    .GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })

Example 1:

    .Select(x => x.FirstOrDefault());  // your original code which retrieves entities to not delete

var dupes = db.Incidents.Except( query ); // get entities to delete

Example 2:

    .SelectMany( x => x.OrderBy( y => y.Id ).Skip(1) ); // gets dupes directly

var dupes = query; // already have what we need

And finally:

foreach( var dupe in dupes )
{
    db.Incidents.Remove( dupe );
}

Example SQL generated from a test context I used earlier where Person entity has a 1:N relationship with watches:

C#:

context.Persons.SelectMany(x => x.Watches.OrderBy(y => y.Id).Skip(1))

Generated SQL:

SELECT 
1 AS [C1], 
[Skip1].[Id] AS [Id], 
[Skip1].[Brand] AS [Brand], 
[Skip1].[Person_Id] AS [Person_Id]
FROM  [dbo].[Persons] AS [Extent1]
CROSS APPLY  (SELECT [Project1].[Id] AS [Id], [Project1].[Brand] AS [Brand], [Project1].[Person_Id] AS [Person_Id]
    FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[Brand] AS [Brand], [Project1].[Person_Id] AS [Person_Id], row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
        FROM ( SELECT 
            [Extent2].[Id] AS [Id], 
            [Extent2].[Brand] AS [Brand], 
            [Extent2].[Person_Id] AS [Person_Id]
            FROM [dbo].[Watches] AS [Extent2]
            WHERE [Extent1].[Id] = [Extent2].[Person_Id]
        )  AS [Project1]
    )  AS [Project1]
    WHERE [Project1].[row_number] > 1 ) AS [Skip1]


回答2:

You will need to use the Distinct function and if you want to just some of the fields then you need to create an Equality Comparer. (IEqualityComparer)

Ahh just seen the comment above, check this out for more:

Remove duplicates in the list using linq



回答3:

var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y")
                .GroupBy(x => new { x.Id, x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
                .Select(x => x.FirstOrDefault());


var query2 = from duplicate in db.Incidents
                 .Where(x => x.IsAttendanceIncident == "Y" && !query.Any(i => i.Id == duplicate.Id));

query2 will now just contain the duplicates?



回答4:

var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y").GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
     .SelectMany(x => x.Skip(1));