I have to update my Doctrine entities to match records inside of (potentionaly very large) XML file. I have to also update ManyToMany associations according to data in the XML. This is what I do inside of a loop:
- get data from XML
- get entity from DB (if does not exist create new)
- set new entity properties
- get current entity associations (getter returns
ArrayCollection
object) - clear all associations (by calling the
ArrayCollection::clear()
) - set new associations (by calling
ArrayCollection::add()
in sub-loop) - persist entity by EntityManager
After the loop I call EntityManager::flush()
.
The problem is that flushing generates large amount of queries instead of updating/inserting/deleting multiple rows at once. For every entity are executed following queries:
- SELECT to get entity from DB
- UPDATE to update entity properties (this is actually skipped now as no properties changed ... yet)
- DELETE to clear previous associations
- INSERT to insert new associations
So in total for 305 records in XML i get 915 queries (I guess it could go up to 1220 queries if all entities would changed) which makes the import very slow.
I could take advantage of IdentityMap and pre-fetch entities before loop, but there are still the UPDATE/DELETE/INSERT queries.
- Is there a way to let the flush method better optimize queries (use multi-insert, WHERE IN instead of multiple DELETE queries, etc.)?
- Is this normal behaviour of flush method or am I doing something wrong?
- Perhaps there is problem in the way how I update the associations of entity. Is there better way how to do this? (instead of "get/clear/add" method)
- I am aware of that Doctrine is not intended for mass betch processing, but I think using it for XML imports is the best way how to avoid DB inconsitencies which could appear with a not-ORM approach. Is that right?
- If the approach above is wrong, how should I solve the problem?
Not sure if this directly answers the question posed by the original poster but hopefully this will help others with Doctrine speed issues when flushing.
...With regards to flush speed make sure your xdebug profiler is not on.
As an example of how much this affected a Doctrine flush operation in my case, it was 55 seconds for 3000 records whereas with the profiler turned off it was 5 seconds!
You're doing it right -- it's just slow, because the added abstraction of the ORM means you can't make the sorts of optimizations you'd like.
That said, the EntityManager does get slow on transactions that large. If you don't absolutely need them all in one big transaction, you can probably get more performant code by flush()ing and then clear()ing the EM every 20-200 iterations of your loop.
If that doesn't get you enough performance, the only alternative that I can think of is to revert to custom code that runs custom SQL directly against your DBMS.
I know this isn't a great answer, but at least I can tell you that you're not crazy.
------ edit ------
From official Doctrine2 article on Batch processing:
Also there is a significant difference in performance when using remote vs local database as overhead of sending each query to remote server is quite large. The overhead is much lower while using local database thanks to transactions and DB optimizations. (e.g. 70sec lowered to 300ms in the case of example in the question)