Background
I have an application that receives periodic data dumps (XML files) and imports them into an existing database using Entity Framework 5 (Code First). The import happens via EF5 rather than say BULK INSERT or BCP because business rules that already exist in the entities must be applied.
Processing seems to be CPU bound in the application itself (the extremely fast, write-cache enabled disk IO subsystem shows almost zero disk wait time throughout the process, and SQL Server shows no more than 8%-10% CPU time).
To improve efficiency, I built a pipeline using TPL Dataflow with components to:
Read & Parse XML file
|
V
Create entities from XML Node
|
V
Batch entities (BatchBlock, currently n=200)
|
V
Create new DbContext / insert batched entities / ctx.SaveChanges()
I see a substantial increase in performance by doing this, but can not get the CPU above about 60%.
Analysis
Suspecting some sort of resource contention, I ran the process using the VS2012 Profiler's Resource contention data (concurrency) mode.
The profiler shows me 52% contention for a resource labeled Handle 2. Drilling in, I see that the method creating the most contention for Handle 2 is
System.Data.Entity.Internal.InternalContext.SaveChanges()
Second place, at about 40% as many contentions as SaveChanges(), is
System.Data.Entity.DbSet`1.Add(!0)
Questions
- How can I figure out what Handle 2 really is (e.g. part of TPL, part of EF)?
- Does EF throttle calls to separate DbContext instances from separate threads? It seems there is a shared resource they are contending for.
- Is there anything that I can do to improve parallelism in this case?
UPDATE
For the run in question, the maximum degree of parallelism for the task that calls SaveChanges is set to 12 (I tried various values including Unbounded in previous runs).
UPDATE 2
Microsoft's EF team has provided feedback. See my answer for a summary.