Very poor performance for batch insert with SQL Se

2019-04-06 18:19发布

问题:

I'm inserting a lot of data into SQL Server CE 4.0 using Entity Framework 4.2 (code-first), and the performance is abysmal when compared to direct SQL insertion.

The model is very simple:

public class DocMember
{
    public DocMember() { this.Items = new List<DocItem>(); }

    public int Id { get; set; }

    public string Name { get; set; }
    public string MemberType { get; set; }
    public string AssemblyName { get; set; }

    public virtual IList<DocItem> Items { get; set; }
}

public class DocItem
{
    public int Id { get; set; }
    public DocMember Member { get; set; }
    public string PartType { get; set; }
    public string PartName { get; set; }
    public string Text { get; set; }
}

I have 2623 DocMembers and a total of of 7747 DocItems to insert, and I'm getting the following execution times:

With SQL: 00:00:02.8
With EF:  00:03:02.2

I can understand there's a bit of overhead with EF, but it is 65 times slower than SQL!

Perhaps there's a problem in my code, but it is quite straightforward and I can't see what could be wrong:

    private TimeSpan ImportMembersEF(IList<DocMember> members)
    {
        using (var db = new DocEntities())
        {
            db.Database.CreateIfNotExists();

            var sw = Stopwatch.StartNew();
            foreach (var m in members)
            {
                db.Members.Add(m);
            }

            db.SaveChanges();
            sw.Stop();
            return sw.Elapsed;
        }
    }

I also tried to call SaveChanges for each inserted item, or every 100 or 200 items, to no avail (it actually makes it worse).

Is there a way to improve the performance, or do I have to use SQL for batch inserts?


EDIT: for completeness, here's the code for the SQL insertion: http://pastebin.com/aeaC1KcB

回答1:

You can use my SqlCeBulkCopy library for loading bulk data, it mimics the SqlBulkCopy api: http://sqlcebulkcopy.codeplex.com



回答2:

It's slow because it's not batching the insert.

When you insert using identity on the db, it has to select the result ID out after each item to assign to the model. This makes it really slow.

Your adhoc SQL isn't selecting the ID out so when batched you can submit all the statements at once.

Altho written with NHibernate:

http://www.philliphaydon.com/2011/09/the-benefits-of-letting-the-orm-generate-the-identity-part-1/

I wrote about using ORM generated IDs vs SQL Generated IDs.