How to avoid geometric slowdown with large Linq tr

2019-07-31 23:17发布

问题:

I've written some really nice, funky libraries for use in LinqToSql. (Some day when I have time to think about it I might make it open source... :) )

Anyway, I'm not sure if this is related to my libraries or not, but I've discovered that when I have a large number of changed objects in one transaction, and then call DataContext.GetChangeSet(), things start getting reaalllly slooowwwww. When I break into the code, I find that my program is spinning its wheels doing an awful lot of Equals() comparisons between the objects in the change set. I can't guarantee this is true, but I suspect that if there are n objects in the change set, then the call to GetChangeSet() is causing every object to be compared to every other object for equivalence, i.e. at best (n^2-n)/2 calls to Equals()...

Yes, of course I could commit each object separately, but that kinda defeats the purpose of transactions. And in the program I'm writing, I could have a batch job containing 100,000 separate items, that all need to be committed together. Around 5 billion comparisons there.

So the question is: (1) is my assessment of the situation correct? Do you get this behavior in pure, textbook LinqToSql, or is this something my libraries are doing? And (2) is there a standard/reasonable workaround so that I can create my batch without making the program geometrically slower with every extra object in the change set?

回答1:

In the end I decided to rewrite the batches so that each individual item is saved independently, all within one big transaction. In other words, instead of:

var b = new Batch { ... };
while (addNewItems) {
  ...
  var i = new BatchItem { ... };
  b.BatchItems.Add(i);
}
b.Insert(); // that's a function in my library that calls SubmitChanges()

.. you have to do something like this:

context.BeginTransaction(); // another one of my library functions
try {
  var b = new Batch { ... };
  b.Insert(); // save the batch record immediately
  while (addNewItems) {
    ...
    var i = new BatchItem { ... };
    b.BatchItems.Add(i);
    i.Insert(); // send the SQL on each iteration
  }
  context.CommitTransaction(); // and only commit the transaction when everything is done.
} catch {
  context.RollbackTransaction();
  throw;
}

You can see why the first code block is just cleaner and more natural to use, and it's a pity I got forced into using the second structure...