I have a process that extracts customer info from multiple databases (MySql) based on a timestamp. I store this data into a DataTable
. The data table represents updates to existing customer info as well as new customer info.
I want to delete any dupes in the destination database (SqlServer) based on one constant value, CompanyID
, and the CustomerID
. So, I thought a join would give me the RecordIDs of the dupes in the destination DB, pass the List<int>
(or some collection mechanism) to the DELETE
method.
What I have:
using (var context = new DataContext(SqlConnection))
{
var tblSource = context.GetTable<tblCustomerInfo>();
var dupeIDs = from currCust in tblSource
join newCust in myTable.AsEnumerable() on currCust.CompanyID equals newCust.Field<string>("CompanyID")
where currCust.CustomerID.Equals(newCust.Field<int>("CustomerID")
select currCust.RecordID;
}
This obviously does not work. I will update with the exact error messages in a bit, but this doesn't compile.
First, is my join syntax even correct for what I am wanting to achieve?
Second, how can I write this Linq to join between a DataTable and the destination SqlServer database?
Afterthought - is it possible to, once I have a collection of dupe RecordIDs, use Linq to DELETE records from the destination database?
Edit
To clarify the process, I have incoming data tables like so and contained in a DataSet
:
Table1
CompanyID CustomerID Field1 Field2 ....
1 5 ... ...
1 15 ... ...
Table2
CompanyID CustomerID Field1 Field2 ....
10 125 ... ...
10 145 ... ...
Which will all go into a single database:
Destination DB
CompanyID CustomerID Field1 Field2 ....
1 5 ... ...
1 15 ... ...
1 27 ... ...
5 15 ... ...
10 125 ... ...
10 145 ... ...
11 100 ... ...
So, in this case I would delete from the destination table the items that match from tables 1 & 2. The destination database will be growing constantly so creating a List of CustomerID does not seem feasible. However, I expect daily imports of new and updated customer info to be relatively small (in the hundreds, maybe near 1000 records).
If I cannot write a single join what other method for completing this process would be appropriate? I am trying to figure something out since it looks like I cannot actually mix Linq-to-Sql and Linq-to-Objects.
Is it possible to somehow map my data table to the entity datamap, tbl_CustomerInfo
, filling an otherwise immutable var, then perform the join?
Update
Here is what I have accomplished at this point and I get the results I expect from dupes
:
using (DataContext context = new DataContext(SqlConnection)
{
var custInfo = context.GetTable<tbl_CustomerInfo>();
string compID = ImportCust.Rows[0]["CompanyID"].ToString();
var imports = from cust in ImportCust.AsEnumerable()
select cust.Field<int>("CustomerID");
var dupes = from cust in custInfo
join import in imports
on cust.CustomerID equals import
where cust.CompanyID == compID
select cust;
custInfo.DeleteOnSubmit(/* what goes here */);
context.SubmitChanges();
}
My question now is, what goes into the DeleteOnSubmit(...)
? I feel like I have gotten so close only to be foiled by this.
Alternatively, you could create two lists of
List<int>
, containing id's from your two sources, then use the Intersect LINQ operator to find the common items.I usually tackle all of this in a stored proc for efficiency.
Add an identity field to your destination table to uniquely identify the records, then use a query like this:
Here is what I have that works:
If there is a more efficient method, I'm interested in options.