Bulk Insert in Symfony and Doctrine: How to select

I am working on a web app using Symfony 2.7 and Doctrine. A Symfony command is used to perform an update of a large number of entities.

I followed the Doctrine guidelines and use $entityManager->flush() not for every single entity.

This is die Doctrine example code:

<?php
$batchSize = 20;
for ($i = 1; $i <= 10000; ++$i) {
    $user = new CmsUser;
    $user->setStatus('user');
    $user->setUsername('user' . $i);
    $user->setName('Mr.Smith-' . $i);
    $em->persist($user);
    if (($i % $batchSize) === 0) {
        $em->flush();
    }
}
$em->flush(); //Persist objects that did not make up an entire batch

The guidelines say:

You may need to experiment with the batch size to find the size that works best for you. Larger batch sizes mean more prepared statement reuse internally but also mean more work during flush.

So I did try different batch size. The larger the batch size, the faster the command completes its task.

Thus the question is: What are the downsides of large batch sizes? Why not use $entityManager->flush() only once, after all entities have been updated

The docu just says, that larger batch sizes "mean more work during flush". But why/when could this be a problem?

The only downside I can see are Exceptions during the update: If the script stops before the saved changed where flushed, the changes are lost. Is this the only limitation?

标签： symfony doctrine-orm

4条回答

Deceive 欺骗

2楼-- · 2019-05-01 20:07

What are the downsides of large batch sizes?

Large batch sizes may use a lot of memory if you create for examples 10,000 entities. If you don't save the entities in batchs, they will accumulate in memory and if the program reach the memory limit it may crash the whole script.

Why not use $entityManager->flush() only once, after all entities have been updated

It's possible, but storing 10,000 entities in the memory before calling flush() once will use more memory than saving entities 100 by 100. It may also take more time.

The docu just says, that larger batch sizes "mean more work during flush". But why/when could this be a problem?

If you don't have any performance issue with biggest batch sizes, it's probably because your data is not big enough to fill the memory or disrupt PHP's memory management.

So the size of the batch depend of multiple factors, mostly memory usage vs. time. If the script consume too much RAM, the size of the batch has to be lowered. But using really small batches may take more time than bigger batches. So you have to run multiple tests in order to adjust this size so that it uses most of the available memory but not more.

I don't have any proofs but I remember having worked with thousands of entities. When I used only one flush(), I saw that the progress bar was getting slower, it looked like my program was getting slower as I added more and more entities in the memory.

0人赞添加讨论(0) 举报

Evening l夕情丶

3楼-- · 2019-05-01 20:12

If the flush takes too much time, you might exceed the maximum execution time of the server, and lose the connection.

0人赞添加讨论(0) 举报

beautiful°

4楼-- · 2019-05-01 20:23

From my experience, 100 entities per batch worked great. Depending on the Entity, 200 was too much and other Entity, I could do 1000.

To properly insert in batch, you will need the command :

$em->clear();

after each of your flushes. The reason is the Doctrine does not free the objects it's flushing into the DB. This means that if you don't "clear" them, the memory consumption will keep on increasing until you bust your PHP Memory Limit and crash your operation.

I would also recommend against increasing PHP Memory Limit to higher values. If you do, you risk creating huge lag on your server which could increase the number of connections to your server and then crash it.

It is also recommended to process batch operations outside of the Web Server upload form page. So save the data in a Blob and then process it later with a Cronjob task that will process your batch processing at the desired time (outside of Web Server's peak usage time).

0人赞添加讨论(0) 举报

Animai°情兽

5楼-- · 2019-05-01 20:25

As suggested in Doctrine documentation, ORM is not the best tool to use with batches.

Unless your entity needs some specific logic (like listeners), avoid ORM and use DBAL directly.

0人赞添加讨论(0) 举报

Bulk Insert in Symfony and Doctrine: How to select

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间