I've been studying up on MongoDB and I understand that it is highly recommended that documents structures are completely built-out (pre-allocated) at the point of insert, this way future changes to that document do not require the document to be moved around on the disk. Does this apply when using $addToSet or $push?
For example, say I have the following document:
"_id" : "rsMH4GxtduZZfxQrC",
"createdAt" : ISODate("2015-03-01T12:08:23.007Z"),
"market" : "LTC_CNY",
"type" : "recentTrades",
"data" : [
{
"date" : "1422168530",
"price" : 13.8,
"amount" : 0.203,
"tid" : "2435402",
"type" : "buy"
},
{
"date" : "1422168529",
"price" : 13.8,
"amount" : 0.594,
"tid" : "2435401",
"type" : "buy"
},
{
"date" : "1422168529",
"price" : 13.79,
"amount" : 0.594,
"tid" : "2435400",
"type" : "buy"
}
]
And I am using one of the following commands to add a new array of objects (newData
) to the data
field:
$addToSet to add to the end of the array:
Collection.update(
{ _id: 'rsMH4GxtduZZfxQrC' },
{
$addToSet: {
data: {
$each: newData
}
}
}
);
$push (with $position) to add to the front of the array:
Collection.update(
{ _id: 'rsMH4GxtduZZfxQrC' },
{
$push: {
data: {
$each: newData,
$position: 0
}
}
}
);
The data
array in the document will grow due to new objects that were added from newData
. So will this type of document update cause the document to be moved around on the disk?
For this particular system, the data
array in these documents can grow to upwards of 75k objects within, so if these documents are indeed being moved around on disk after every $addToSet or $push update, should the document be defined with 75k nulls (data: [null,null...null]
) on insert, and then perhaps use $set to replace the values over time? Thanks!
I understand that it is highly recommended that documents structures are completely built-out (pre-allocated) at the point of insert, this way future changes to that document do not require the document to be moved around on the disk. Does this apply when using $addToSet or $push?
It's recommended if it's feasible for the use case, which it usually isn't. Time series data is a notable exception. It doesn't really apply with $addToSet
and $push
because they tend to increase the size of the document by growing an array.
the data array in these documents can grow to upwards of 75k objects within
Stop. Are you sure you want constantly growing arrays with tens of thousands of entries? Are you going to query wanting specific entries back? Are you going to index any fields in the array entries? You probably want to rethink your document structure. Maybe you want each data
entry to be a separate document with fields like market
, type
, createdAt
replicated in each? You wouldn't be worrying about document moves.
Why will the array grow to 75K entries? Can you do less entries per document? Is this time series data? It's great to be able to preallocate documents and do in-place updates with the mmap storage engine, but it's not feasible for every use case and it's not a requirement for MongoDB to perform well.
should the document be defined with 75k nulls (data: [null,null...null]) on insert, and then perhaps use $set to replace the values over time?
No, this is not really helpful. The document size will be computed based on the BSON size of the null values in the array, so when you replace null
with another type the size will increase and you'll get document rewrites anyway. You would need to preallocate the array with objects with all fields set to a default value for its type, e.g.
{
"date" : ISODate("1970-01-01T00:00:00Z") // use a date type instead of a string date
"price" : 0,
"amount" : 0,
"tid" : "000000", // assuming 7 character code - strings icky for default preallocation
"type" : "none" // assuming it's "buy" or "sell", want a default as long as longest real values
}
MongoDB uses the power of two allocation strategy to store your documents, which means it will allocate the size of the document^2 for storage. Therefore if your nested arrays don't lead to a total growth larger then the original size to the power of two, mongo will not have to reallocate the document.
See: http://docs.mongodb.org/manual/core/storage/
Bottom line here is that any "document growth" is pretty much always going to result in the "physical move" of the storage allocation unless you have "pre-allocated" by some means on the original document submission. Yes there is "power of two" allocation, but this does not always mean anything valid to your storage case.
The additional "catch" here is on "capped collections", where indeed the "hidden catch" is that such "pre-allocation" methods are likely not to be "replicated" to other members in a replica set if those instructions fall outside of the "oplog" period where the replica set entries are applied.
Growing any structure beyond what is allocated from an "initial allocation" or the general tricks that can be applied will result in that document being "moved" in storage space when it grows beyond the space it was originally supplied with.
In order to ensure this does not happen, then you always "pre-allocate" to the expected provisions of your data on the original creation. And with the obvious caveat of the condition already described.