I need some help using a lambda expression to remove duplicate entries in my Entity Framework context. I have a table with the following columns:
Id, DateOfIncident, Description, EmployeeId, IncidentTypeId, and IsAttendanceIncident
I want to remove duplicate entries where DateOfIncident, EmployeeID, IncidentTypeID and IsAttendanceIncident are the same.
I do want to keep one entry. I know how to do this using a Stored Procedure with CTE in SQL, but I cannot figure out how to accomplish this task using a Lambda expression.
This code returns a list excluding my duplicates, but now how do I go about removing those duplicates that are not in this list ?
var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y").GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
.Select(x => x.FirstOrDefault());
UPDATE:
So I went ahead and wrote the custom IEqualityComparer. Now how do I remove the incidents in my context that are not in my distinctItems using the id ?
static void Main(string[] args)
{
DALIncidents.AttendanceEntities1 db = new DALIncidents.AttendanceEntities1();
IEnumerable<DALIncidents.Incident> distinctItems = db.Incidents.Where(c => c.IsAttendanceIncident == "Y");
distinctItems = distinctItems.Distinct(new DALIncidents.DistinctIncidentComparer());
Console.ReadLine();
}
var query = db.Incidents
.Where(x => x.IsAttendanceIncident == "Y")
.GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
Example 1:
.Select(x => x.FirstOrDefault()); // your original code which retrieves entities to not delete
var dupes = db.Incidents.Except( query ); // get entities to delete
Example 2:
.SelectMany( x => x.OrderBy( y => y.Id ).Skip(1) ); // gets dupes directly
var dupes = query; // already have what we need
And finally:
foreach( var dupe in dupes )
{
db.Incidents.Remove( dupe );
}
Example SQL generated from a test context I used earlier where Person entity has a 1:N relationship with watches:
C#:
context.Persons.SelectMany(x => x.Watches.OrderBy(y => y.Id).Skip(1))
Generated SQL:
SELECT
1 AS [C1],
[Skip1].[Id] AS [Id],
[Skip1].[Brand] AS [Brand],
[Skip1].[Person_Id] AS [Person_Id]
FROM [dbo].[Persons] AS [Extent1]
CROSS APPLY (SELECT [Project1].[Id] AS [Id], [Project1].[Brand] AS [Brand], [Project1].[Person_Id] AS [Person_Id]
FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[Brand] AS [Brand], [Project1].[Person_Id] AS [Person_Id], row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
FROM ( SELECT
[Extent2].[Id] AS [Id],
[Extent2].[Brand] AS [Brand],
[Extent2].[Person_Id] AS [Person_Id]
FROM [dbo].[Watches] AS [Extent2]
WHERE [Extent1].[Id] = [Extent2].[Person_Id]
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 1 ) AS [Skip1]
You will need to use the Distinct function and if you want to just some of the fields then you need to create an Equality Comparer. (IEqualityComparer)
Ahh just seen the comment above, check this out for more:
Remove duplicates in the list using linq
var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y")
.GroupBy(x => new { x.Id, x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
.Select(x => x.FirstOrDefault());
var query2 = from duplicate in db.Incidents
.Where(x => x.IsAttendanceIncident == "Y" && !query.Any(i => i.Id == duplicate.Id));
query2 will now just contain the duplicates?
var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y").GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
.SelectMany(x => x.Skip(1));