Recently, I am doing some simple EF work. Very simple, first,
List<Book> books = entity.Books.WHERE(c=>c.processed==false)ToList();
then
foreach(var book in books)
{
//DoSomelogic, update some properties,
book.ISBN = "ISBN " + randomNumber();
book.processed = true;
entity.saveChanges(book)
}
I put entity.saveChanges
within the foreach
because it is a large list, around 100k records, and if this record is processed without problem, then flag this record, set book.processed = true, if the process is interrupted by exception, then next time, I don't have to process these good records again.
Everything seems ok to me. It is fast when you process hundreds of records. Then when we move to 100k records, entity.saveChanges is very very slow. around 1-3 seconds per record. Then we keep the entity model, but replace entity.saveChanges
with classic SqlHelper.ExecuteNonQuery("update_book", sqlparams)
. And it is very fast.
Could anyone tell me why entity framework process that slow? And if I still want to use entity.saveChanges, what is the best way to improve the performance?
Thank you
I would take the SaveChanges(book) outside of the foreach. Since book is on the entity as a list, you can put this outside and EF will work better with the resulting code.
The list is an attribute on the entity, and EF is designed to optimize updates/creates/deletes on the back end database. If you do this, I'd be curious whether it helps.
I too may advise you to take the SaveChanges() out of the loop, as it does 'n' number of updates to the database, thus the context will have 'n' times to iterate through the checkpoints and validations required.
Turn off change tracking before you perform your inserts. This will improve your performance significantly (magnitudes of order). Putting
SaveChanges()
outside your loop will help as well, but turning off change tracking will help even more.See this page for some more information.
"AsNoTracking" works for me
ex:
Without "AsNoTracking" the updates is very slow.
The Entity Framework, in my opinion, is a poor choice for BULK operations both from a performance and a memory consumption standpoint. Once you get beyond a few thousand records, the SaveChanges method really starts to break down.
You can try to partition your work over into smaller transactions, but again, I think you're working too hard to create this.
A much better approach is to leverage the BULK operations that are already provided by your DBMS. SQL Server provides BULK COPY via .NET. Oracle provides BULK COPY for the Oracle.DataAccess or unmanaged data access. For Oracle.ManagedDataAccess, the BULK COPY library unfortunately isn't available. But I can create an Oracle Stored Procedure using BULK COLLECT/FOR ALL that allows me to insert thousands of records within seconds with a much lower memory footprint within your application. Within the .NET app you can implement PLSQL Associative arrays as parameters, etc.
The benefit of leveraging the BULK facilities within your DBMS is reducing the context switches between your application, the query processor and the database engine.
I'm sure other database vendors provide something similar.
Use this Nuget package: Z.EntityFramework.Extensions
It has extension methods that you can call on the DbContext like DbContext.BulkSaveChanges which works amazingly fast.