Is there a way to convert a nested document structure into an array? Below is an example:
Input
"experience" : {
"0" : {
"duration" : "3 months",
"end" : "August 2012",
"organization" : {
"0" : {
"name" : "Bank of China",
"profile_url" : "http://www.linkedin.com/company/13801"
}
},
"start" : "June 2012",
"title" : "Intern Analyst"
}
},
Expected Output:
"experience" : [
{
"duration" : "3 months",
"end" : "August 2012",
"organization" : {
"0" : {
"name" : "Bank of China",
"profile_url" : "http://www.linkedin.com/company/13801"
}
},
"start" : "June 2012",
"title" : "Intern Analyst"
}
],
Currently I am using a script to iterate over each element, convert them to an array & finally update the document. But it is taking a lot of time, is there a better way of doing this?
You still need to iterate over the content, but instead you should be writing back using bulk operations:
Either for MongoDB 2.6 and greater:
var bulk = db.collection.initializeUnorderedBulkOp(),
count = 0;
db.collection.find({
"$where": "return !Array.isArray(this.experience)"
}).forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "experience": [doc.experience["0"]] }
});
count++;
// Write once in 1000 entries
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// Write the remaining
if ( count % 1000 != 0 )
bulk.execute();
Or in modern releases of MongoDB 3.2 and greater, the bulkWrite()
method is preferred:
var ops = [];
db.collection.find({
"$where": "return !Array.isArray(this.experience)"
}).forEach(function(doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "experience": [doc.experience["0"]] } }
}
});
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops,{ "ordered": false })
ops = [];
}
})
if ( ops.length > 0 )
db.collection.bulkWrite(ops,{ "ordered": false });
So when writing back to the database over a cursor, then bulk write operations with "unordered" set is the way to go. It's only one write/response per batch of 1000 requests, which reduces a lot of overhead, and "unordered" means that writes can happen in parallel rather than in a serial order. It all makes it faster.
See if this query works with your MongoDB version
For MongoDB version 3.2+:
db.doc.aggregate([
{$project:{experience:["$experience.0"]}}
])
MongoDB < 3.2:
db.doc.aggregate([
{$group: {_id:"$_id", experience:{$push:"$experience.0"}}}
])
It should transform your document into:
{
"_id" : ObjectId("56f1b046a65ea8a72c34839c"),
"experience" : [
{
"duration" : "3 months",
"end" : "August 2012",
"organization" : {
"0" : {
"name" : "Bank of China",
"profile_url" : "http://www.linkedin.com/company/13801"
}
},
"start" : "June 2012",
"title" : "Intern Analyst"
}
]
}
A better approach if you want to alter documents in collection permanently using aggregation framework.
Lets assume your collection name is doc
db.doc.aggregate([
{$group: {_id:"$_id", experience:{$push:"$experience.0"}}},
{$out: "doc"}
])
Query above will transform all of your documents in place.