Possible to write a Join between Sql and DataTable

2019-06-10 18:38发布

问题:

I have a process that extracts customer info from multiple databases (MySql) based on a timestamp. I store this data into a DataTable. The data table represents updates to existing customer info as well as new customer info.

I want to delete any dupes in the destination database (SqlServer) based on one constant value, CompanyID, and the CustomerID. So, I thought a join would give me the RecordIDs of the dupes in the destination DB, pass the List<int> (or some collection mechanism) to the DELETE method.

What I have:

using (var context = new DataContext(SqlConnection))
{
    var tblSource = context.GetTable<tblCustomerInfo>();
    var dupeIDs = from currCust in tblSource
                  join newCust in myTable.AsEnumerable() on currCust.CompanyID equals newCust.Field<string>("CompanyID")
                  where currCust.CustomerID.Equals(newCust.Field<int>("CustomerID")
                  select currCust.RecordID;
}

This obviously does not work. I will update with the exact error messages in a bit, but this doesn't compile.

First, is my join syntax even correct for what I am wanting to achieve?

Second, how can I write this Linq to join between a DataTable and the destination SqlServer database?

Afterthought - is it possible to, once I have a collection of dupe RecordIDs, use Linq to DELETE records from the destination database?

Edit To clarify the process, I have incoming data tables like so and contained in a DataSet:

Table1
CompanyID    CustomerID    Field1    Field2    ....
   1             5          ...       ...
   1             15         ...       ...

Table2
CompanyID    CustomerID    Field1    Field2    ....
   10           125         ...       ...
   10           145         ...       ...

Which will all go into a single database:

Destination DB  
CompanyID    CustomerID    Field1    Field2    ....   
   1              5          ...       ...  
   1              15         ...       ...    
   1              27         ...       ... 
   5              15         ...       ... 
   10             125        ...       ...
   10             145        ...       ... 
   11             100        ...       ...

So, in this case I would delete from the destination table the items that match from tables 1 & 2. The destination database will be growing constantly so creating a List of CustomerID does not seem feasible. However, I expect daily imports of new and updated customer info to be relatively small (in the hundreds, maybe near 1000 records).

If I cannot write a single join what other method for completing this process would be appropriate? I am trying to figure something out since it looks like I cannot actually mix Linq-to-Sql and Linq-to-Objects.

Is it possible to somehow map my data table to the entity datamap, tbl_CustomerInfo, filling an otherwise immutable var, then perform the join?

Update

Here is what I have accomplished at this point and I get the results I expect from dupes:

using (DataContext context = new DataContext(SqlConnection)
{
    var custInfo = context.GetTable<tbl_CustomerInfo>();

    string compID = ImportCust.Rows[0]["CompanyID"].ToString();

    var imports = from cust in ImportCust.AsEnumerable()
                  select cust.Field<int>("CustomerID");

    var dupes = from cust in custInfo
                join import in imports
                on cust.CustomerID equals import
                where cust.CompanyID == compID
                select cust;

    custInfo.DeleteOnSubmit(/* what goes here */);
    context.SubmitChanges();
}

My question now is, what goes into the DeleteOnSubmit(...)? I feel like I have gotten so close only to be foiled by this.

回答1:

I usually tackle all of this in a stored proc for efficiency.

Add an identity field to your destination table to uniquely identify the records, then use a query like this:

DELETE d
FROM DestinationTable d JOIN (
   Select CompanyID, CustomerID, Min(UniqueID) AS FirstRecID
    FROM DestinationTable
    GROUP BY CompanyID, CustomerID) u on u.CompanyID=d.CompanyID AND u.CustomerID=d.CustomerID
WHERE d.UniqueID <> u.FirstRecID


回答2:

Alternatively, you could create two lists of List<int>, containing id's from your two sources, then use the Intersect LINQ operator to find the common items.

List<int> a = new List<int>{1,2,3,4,5,6,8, 10};
List<int> b = new List<int>{1,2,99,5,6,8, 10};
var c= a.Intersect(b);  //returns the items common to both lists


回答3:

Here is what I have that works:

using (DataContext context = new DataContext(SqlConnection)
{
    var custInfo = context.GetTable<tbl_CustomerInfo>();

    string compID = ImportCust.Rows[0]["CompanyID"].ToString();

    var imports = from cust in ImportCust.AsEnumerable()
                  select cust.Field<int>("CustomerID");

    var dupes = from import in imports
                join cust in custInfo
                on import equals cust.CustomerID 
                where cust.CompanyID== pivnum
                select cust;

    var records = dupes.GetEnumerator();

    while (records.MoveNext())
    { custInfo.DeleteOnSubmit(records.Current); }

    context.SubmitChanges();
}

If there is a more efficient method, I'm interested in options.