Entity Framework 5 poor performance

2019-02-24 11:29发布

问题:

I have 5 entities:

public class Album
{
    public int Id { get; set; }

    public string Title { get; set; }

    public virtual List<AlbumArtist> AlbumArtists { get; set; }
    public virtual List<Artist> Artists { get; set; }
    public virtual List<Genre> Genres { get; set; }
    public virtual List<Song> Songs { get; set; }

}

public class AlbumArtist
{
    public int Id { get; set; }

    public string Title { get; set; }

    public virtual List<Album> Albums { get; set; }
    public virtual List<Artist> Artists { get; set; }
    public virtual List<Genre> Genres { get; set; }
    public virtual List<Song> Songs { get; set; }
}

public class Artist
{
    public int Id { get; set; }

    public string Title { get; set; }

    public virtual List<AlbumArtist> AlbumArtists { get; set; }
    public virtual List<Album> Albums { get; set; }
    public virtual List<Genre> Genres { get; set; }
    public virtual List<Song> Songs { get; set; }
}

public class Genre
{
    public int Id { get; set; }

    public string Title { get; set; }

    public virtual List<AlbumArtist> AlbumArtists { get; set; }
    public virtual List<Album> Albums { get; set; }
    public virtual List<Artist> Artists { get; set; }
    public virtual List<Song> Songs { get; set; }
}

public class Song
{
    public int Id { get; set; }

    public string Title { get; set; }

    public virtual List<AlbumArtist> AlbumArtists { get; set; }
    public virtual List<Album> Albums { get; set; }
    public virtual List<Artist> Artists { get; set; }
    public virtual List<Genre> Genres { get; set; }
}

As you can see, there are a lot of many-to-many relationships. I populate my entities and then try to save them to DbContext in that way:

_albumArtists.ForEach(delegate(AlbumArtist albumArtist)
{
    if (albumArtist.Id == 0)
    {
            _dbContext.Entry(entity).State = EntityState.Added;
            _dbContext.SaveChanges();
    }
    else
    {
            _dbContext.Entry(entity).State = EntityState.Modified;
            _dbContext.SaveChanges();
    }
});
...

or in that way:

_albumArtists.ForEach(delegate(AlbumArtist albumArtist)
{
    if (albumArtist.Id == 0)
    {
            _dbContext.Entry(entity).State = EntityState.Added;
    }
    else
    {
            _dbContext.AlbumArtists.State = EntityState.Modified;
    }
});
_dbContext.SaveChanges();
...

It takes forever to save my entities to DbContext. I even tried to do the following:

Configuration.AutoDetectChangesEnabled = false;

But it didn't helped. By the way, there are for about 17 000 Songs and 1 700 Albums.

What is wrong???

Please help!

PS

Here is my full code: https://github.com/vjacheslavravdin/PsyTrance/blob/master/PsyTrance/Program.cs Maybe you can suggest how to simplify it.

Thanks!

回答1:

Firstly a couple of clarifications:

EF is not significantly slower than other methods for batch based operations. in my tests you can get perhaps a 50% improvement with a raw SQL Command and maybe up to 10 times faster with SQL Bulk copy but EF is not much slower than comparative methods as a general rule (though often perceived as very slow). For most applications EF will give suitable performance numbers even in batch scenarios given the correct tuning. (see my article here: http://blog.staticvoid.co.nz/2012/3/24/entity_framework_comparative_performance and http://blog.staticvoid.co.nz/2012/8/17/mssql_and_large_insert_statements)

Because of the way EF does change tracking its has potential to far exceed the performance of how most people would write SqlCommand based insert statements (theres a whole lot of caviats to do with query planning, round trips and transactions that make it pretty hard to write optimally performing bulk insert statements). I have proposed these additions to EF here (http://entityframework.codeplex.com/discussions/377636) but havent implemented them yet.

You are exactly right with your decision to turn off auto detect changes, each .Add or .Attach action with detect changes on enumerates the tracking graph, therefore if you're adding 17k additions on the same context you will need to enumerate the graph 17000 times over a total of 17000 + 16999 + ...+ 2 + 1 = 144,500,000 entities, no wonder its taking so long right? (see my article here: http://blog.staticvoid.co.nz/2012/5/7/entityframework_performance_and_autodetectchanges)

Save changes always needs to enumerate the tracking graph (it calls detect changes internally) so your first way is going to be slow as its actually going to be doing the same number of tracking calls as above.

The second way is much better but it still has a pretty major flaw which I imagine is twofold, firstly the graph is really big when you go to save changes (bigger graphs have exponentially higher tracking times), and secondly its going to take up a lot of memory to persist the whole graph at once especially given that EF stores two copies of each of your entities.

A better way is to persist your graph in chunks. some

//With Auto detect changes off.
foreach(var batch in batches)//keep batch size below 1000 items, play around with the numbers a little
{
    using(var ctx = new MyContext())//make sure you create a new context per batch.
    {
        foreach(var entity in batch){
             ctx.Entities.Add(entity);
        }
        ctx.SaveChanges();
    }
}

I would expect you should target around 17-30s to do all 17k rows.

by doing this with raw SQL commands you may be able to get this to around 12-20s;

with a reimplementation with bulk copy you could probably get this down to 2-5s