Have searched ant tested many examples in this forum but can't get a fully working method.
I am using linq to bulk insert a list of entity classes (RemoteReadings).
Due to unique constraints I need to filter out any items already inserted.
Uniqiuness is composed of 2 columns meterid and datetime in RemoteReadings table.
// approx 5000 records (I need to do this in batches of 2000 due to a
// constraint in L2S,but can do this after get this working)
List<RemoteReading> lst = createListFromCSV();
// Option 1:
// This does not work as am comparing memory list to db list. I need to use contains() method.
// Actually am trying to accomplish this infollowing examples.
List<RemoteReading> myLst = (from ri in db.RemoteReadings
from l in lst
where l.meterid = ri.meterid
&& l.date = r.date
select ri).ToList();
////
// Option2:
// Get the list from DB that are in memory lst
List<RemoteReading> myLst = (from ri in db.RemoteReadings
where
// where in this list by comparing meaterid and datemeaured
(from l in lst
select
/// help here !
///
select ri).ToList<RemoteInterconnectorReading>();
// Option3:
// Get the list from lst that are not in database
// I am bit confused here !
// Tried also to remove from list any duplicates:
List<RemoteReading> result = List<RemoteReading>)myLst.Except(lst).ToList<RemoteReading>();
// Ultimately
db.RemoteReading.InsertAllOnSubmit(result);
db.submitChanges();
Any help please?
Due to limitations in EF, we can't join DB query with in-memory list. Also, Contains
can only be used with primitive list. So we need to make some efforts to find the duplicates on two columns.
var newItems = createListFromCSV();
var meterIds = newItems.Select(n=> n.meterid).Distinct().ToList();
var dates = newItems.Select(n=> n.date).Distinct().ToList();
var probableMatches = (from ri in db.RemoteReadings
where (meterIds.Contains(ri.meterids)
|| dates.Contains(ri.date)
select new {ri.merterid, ri.date}).ToList();
var duplicates = (from existingRi in probaleMatches
join newRi in newItems
on new {existingRi.meterid, existingRi.date}
equals {newRi.meterid, newRi.date}
select newRi).ToList();
var insertList = newItems.Except(duplicates).ToList();
db.RemoteReadings.Insert(insertList); // or whatever
With the great help of aSharma and some other tweaks, I finally got a working and tested method. As my lists contain over 5000 items I had to execute in batches to override the 2112 SQL RPC call limitation. Added some comments and credits :)
/// List<RemoteReadings> contains a list of database Entity Classes RemoteReadings
public List<RemoteReadings> removeDublicatesFirst(List<RemoteReadings> lst)
{
try
{
DataClasses1DataContext db = new DataClasses1DataContext();
var meterIds = lst.Select(n => n.meterId).Distinct().ToList();
var dates = lst.Select(n => n.mydate).Distinct().ToList();
var myfLst = new List<RemoteReadings>();
// To avoid the following SqlException, Linq query should be exceuted in batches as follows.
//{System.Data.SqlClient.SqlException
// The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect.
// Too many parameters were provided in this RPC request. The maximum is 2100.
foreach (var batch in dates.Batch(2000))
{
// Gets a list of possible matches from DB.
var probableMatches = (from ri in db.RemoteReadingss
where (meterIds.Contains(ri.meterId)
&& batch.Contains(ri.mydate))
select new { ri.meterId, ri.mydate }).ToList();
// Join the probableMatches with the lst in memory on unique
// constraints meterid.date to find any duplicates
var duplicates = (from existingRi in probableMatches
join newRi in lst
on new
{
existingRi.meterId,
existingRi.mydate
}
equals new { newRi.meterId, newRi.mydate }
select newRi).ToList();
//Add duplicates in a new List due to batch executions.
foreach (var s in duplicates)
{
myfLst.Add(s);
}
}
// Remove the duplicates from lst found in myfLst;
var insertList = lst.Except(myfLst).ToList();
return insertList;
}
catch
(Exception ex)
{
return null;
}
}
// Found this extension Class to divide IEnumerable in batches.
// http://stackoverflow.com/a/13731854/288865
public static class MyExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items,
int maxItems)
{
return items.Select((item, inx) => new { item, inx })
.GroupBy(x => x.inx / maxItems)
.Select(g => g.Select(x => x.item));
}
}