Entity Framework and Parallelism

2020-02-22 00:47发布

Background

I have an application that receives periodic data dumps (XML files) and imports them into an existing database using Entity Framework 5 (Code First). The import happens via EF5 rather than say BULK INSERT or BCP because business rules that already exist in the entities must be applied.

Processing seems to be CPU bound in the application itself (the extremely fast, write-cache enabled disk IO subsystem shows almost zero disk wait time throughout the process, and SQL Server shows no more than 8%-10% CPU time).

To improve efficiency, I built a pipeline using TPL Dataflow with components to:

Read & Parse XML file
        |
        V
Create entities from XML Node
        |
        V
Batch entities (BatchBlock, currently n=200)
        |
        V
Create new DbContext / insert batched entities / ctx.SaveChanges()

I see a substantial increase in performance by doing this, but can not get the CPU above about 60%.

Analysis

Suspecting some sort of resource contention, I ran the process using the VS2012 Profiler's Resource contention data (concurrency) mode.

The profiler shows me 52% contention for a resource labeled Handle 2. Drilling in, I see that the method creating the most contention for Handle 2 is

System.Data.Entity.Internal.InternalContext.SaveChanges()

Second place, at about 40% as many contentions as SaveChanges(), is

System.Data.Entity.DbSet`1.Add(!0)

Questions

  • How can I figure out what Handle 2 really is (e.g. part of TPL, part of EF)?
  • Does EF throttle calls to separate DbContext instances from separate threads? It seems there is a shared resource they are contending for.
  • Is there anything that I can do to improve parallelism in this case?

UPDATE

For the run in question, the maximum degree of parallelism for the task that calls SaveChanges is set to 12 (I tried various values including Unbounded in previous runs).

UPDATE 2

Microsoft's EF team has provided feedback. See my answer for a summary.

1条回答
叼着烟拽天下
2楼-- · 2020-02-22 01:07

The following summarizes my interaction with the Entity Framework team on this issue. I'll update the answer if more information becomes available

  • The issue can be reproduced at Microsoft.
  • The handle contention is related to Network I/O (even with SQL Server on localhost). Specifically, there is contention for the reading buffer for Network I/O in System.Data.dll.
  • The EF team is now working with the SQL Connectivity team to better understand the issue.
  • There is as yet no guidance from Microsoft on how to minimize the impact of this contention.

UPDATE

This issue is now being tracked on CodePlex:

http://entityframework.codeplex.com/workitem/636?PendingVoteId=636

查看更多
登录 后发表回答