Symfony : Doctrine data fixture : how to handle la

2020-08-25 05:17发布

问题:

I am trying to insert (in a mySQL database) datas from a "large" CSV file (3Mo / 37000 lines / 7 columns) using doctrine data fixtures.

The process is very slow and at this time I could not succeed (may be I had to wait a little bit more).

I suppose that doctrine data fixtures are not intended to manage such amount of datas ? Maybe the solution should be to import directly my csv into database ?

Any idea of how to proceed ?

Here is the code :

<?php

namespace FBN\GuideBundle\DataFixtures\ORM;

use Doctrine\Common\DataFixtures\AbstractFixture;
use Doctrine\Common\DataFixtures\OrderedFixtureInterface;
use Doctrine\Common\Persistence\ObjectManager;
use FBN\GuideBundle\Entity\CoordinatesFRCity as CoordFRCity;

class CoordinatesFRCity extends AbstractFixture implements OrderedFixtureInterface
{
    public function load(ObjectManager $manager)
    {
        $csv = fopen(dirname(__FILE__).'/Resources/Coordinates/CoordinatesFRCity.csv', 'r');

        $i = 0;

        while (!feof($csv)) {
            $line = fgetcsv($csv);

            $coordinatesfrcity[$i] = new CoordFRCity();
            $coordinatesfrcity[$i]->setAreaPre2016($line[0]);
            $coordinatesfrcity[$i]->setAreaPost2016($line[1]);
            $coordinatesfrcity[$i]->setDeptNum($line[2]);
            $coordinatesfrcity[$i]->setDeptName($line[3]);
            $coordinatesfrcity[$i]->setdistrict($line[4]);
            $coordinatesfrcity[$i]->setpostCode($line[5]);
            $coordinatesfrcity[$i]->setCity($line[6]);

            $manager->persist($coordinatesfrcity[$i]);

            $this->addReference('coordinatesfrcity-'.$i, $coordinatesfrcity[$i]);


            $i = $i + 1;
        }

        fclose($csv);

        $manager->flush();
    }

    public function getOrder()
    {
        return 1;
    }
}

回答1:

Two rules to follow when you create big batch imports like this:

  • Disable SQL Logging: ($manager->getConnection()->getConfiguration()->setSQLLogger(null);) to avoid huge memory loss.

  • Flush and clear frequently instead of only once at the end. I suggest you add if ($i % 25 == 0) { $manager->flush(); $manager->clear() } inside your loop, to flush every 25 INSERTs.

EDIT: One last thing I forgot: don't keep your entities inside variables when you don't need them anymore. Here, in your loop, you only need the current entity that is being processed, so don't store previous entity in a $coordinatesfrcity array. This might lead you to memory overflow if you keep doing that.



回答2:

There is a great example in the Docs: http://doctrine-orm.readthedocs.org/projects/doctrine-orm/en/latest/reference/batch-processing.html

Use a modulo (x % y) expression to implement batch processing, this example will insert 20 at a time. You may be able to optimise this depending on your server.

$batchSize = 20;
for ($i = 1; $i <= 10000; ++$i) {
    $user = new CmsUser;
    $user->setStatus('user');
    $user->setUsername('user' . $i);
    $user->setName('Mr.Smith-' . $i);
    $em->persist($user);
    if (($i % $batchSize) === 0) {
        $em->flush();
        $em->clear(); // Detaches all objects from Doctrine!
    }
}
$em->flush(); //Persist objects that did not make up an entire batch
$em->clear();


回答3:

For fixtures which need lots of memory but don't depend on each other, I get around this problem by using the append flag to insert one entity (or smaller group of entities) at a time:

bin/console doctrine:fixtures:load --fixtures="memory_hungry_fixture.file" --append

Then I write a Bash script which runs that command as many times as I need.

In your case, you could extend the Fixtures command and have a flag which does batches of entities - the first 1000 rows, then the 2nd 1000, etc.