how to ignore duplicate documents when using inser

2020-08-09 06:16发布

问题:

I am using mongo php library, and trying to insert some old data into mongodb. I used insertMany() method and pass a huge array of document, that may have duplicate documents on unique indexes.

Lets say I have a users collection and have these indexes:

[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "test.users"
    },
    {
        "v" : 1,
        "unique" : true,
        "key" : {
            "email" : 1
        },
        "name" : "shop_id_1_title_1",
        "ns" : "test.users"
    }
]

If there's a duplicate document, MongoDB\Driver\Exception\BulkWriteException would raise and stop the process. I want to find a way to ignore inserting duplicate documents (and also preventing the exception from raise) and continue inserting other documents.

I found in php.net documentation a flag called continueOnError that do the trick but it seems that it's not working with this library.

The example from php.net:

<?php

$con = new Mongo;
$db = $con->demo;

$doc1 = array(
        '_id' => new MongoId('4cb4ab6d7addf98506010001'),
        'id' => 1,
        'desc' => "ONE",
);
$doc2 = array(
        '_id' => new MongoId('4cb4ab6d7addf98506010002'),
        'id' => 2,
        'desc' => "TWO",
);
$doc3 = array(
        '_id' => new MongoId('4cb4ab6d7addf98506010002'), // same _id as above
        'id' => 3,
        'desc' => "THREE",
);
$doc4 = array(
        '_id' => new MongoId('4cb4ab6d7addf98506010004'),
        'id' => 4,
        'desc' => "FOUR",
);

$c = $db->selectCollection('c');
$c->batchInsert(
    array($doc1, $doc2, $doc3, $doc4),
    array('continueOnError' => true)
);

And the way I tried to use the flag with mongo php library:

<?php

$users = (new MongoDB\Client)->test->users

$collection->insertMany([
    [
        'username' => 'admin',
        'email' => 'admin@example.com',
        'name' => 'Admin User',
    ],
    [
        'username' => 'test',
        'email' => 'test@example.com',
        'name' => 'Test User',
    ],
    [
        'username' => 'test 2',
        'email' => 'test@example.com',
        'name' => 'Test User 2',
    ],
],
[
    'continueOnError' => true    // This option is not working
]);

The code above still raise the exception, and seems not to work. Is there other option flag or is there any way to do this?

回答1:

Try to replace 'continueOnError' option by 'ordered' set to false, based on the documentation, when ordered option is set to false the insertMany will continue writing, even if a single write fails.

here is the docs link: insertMany