Recently I notice a huge performance difference between doing multiple upserts (via bulk operations) vs an insert (multiple documents). I would like to know if I am correctly on this:
- Upsert/Updates will be like a
find()
and update()
so it does 2 things read and write
- Insert will just write so its a lot faster
Thus the performance difference?
If this is the case, I wonder if I need a lot of writes regularly, instead of updating a document, I write a new document with a createdOn
field. Then to query, I will just query for documents, sorted by createdOn DESC
. I wonder if this is a good method? Or is there a better way?
- I do wonder if I have index on the collection, might it speed up the update? But wont this index slow down the write portion then?
- With the 2nd way, where I only do inserts, will it slow down then I have too many documents? Is it practical (to speed up the writes)?
- I have also tried increasing the connection pool size. Not sure whats the optimum, but I tried 20 and I see I can handle abt 20 queries per sec thru mongostat. I expected it to be alot higher.
If your inserting document, Mongodb needs to check whether the document with the same objectId is exists or not. If its exists document cannot be inserted.
Same case apply to Update. It needs to check whether the document exists or not. else update cannot be performed. The case where your update query will slow if your not finding document based on your ObjectId / Indexed field.
Else performance for inserting / updating document should be same.
Eg.....
So Insert can be like this //(Fast)
- (Check for document -> Not Found -> Insert new document) Else
- (Check for document -> Found -> Cannot Inserted)
And Update with upsert (ObjectId available) //(Fast)
- (Check for document -> Not Found -> Insert new document) Else
- (Check for document -> Found -> Update the document)
Or Update with upsert (Without ObjectId) //This is slow
- (Find ObjectId's (Slow) -> Not Found -> Insert new document) Else
- (Find ObjectId's (Slow)-> Found -> Update the documents)
I haven't found an 'official' explanation on how an upsert works in MongoDB, but yes it is safe to assume that, since the operation is aimed at updating existing documents and only add a document when the document with the given criteria cannot be found.
If you add an index, then the upsert can become faster: after all the index is used to 'find' the document. The caveat is in the field(s) the index operates on and the fields that you're updating. If the updated portion is part of the index, you will have a performance impact on updating the document. If the updated portion is not part of the index, you will not incur a penalty for writing in the existing document. If the document is added though, you will have a minor performance impact, since the index collection is update.
But still: just adding a document will remain faster.
Therefore, if in your scenario you know that you don't want to update documents, then inserts are generally faster.
If you want to make sure that you do not add the same document twice, you can also opt for adding a unique index. Then an insert will simply fail.
All in all it depends on the specific scenario, but based on the information I can extract from your question I think the best option is to simply insert the documents. Since you seem to make sure that the 'createdon' field makes the documents unique in your scenario you only have to worry about indexes that are used in your read-scenarios.
Some extra info can be found on the mongo site:
https://docs.mongodb.com/v3.4/core/write-performance/
For more information on designing your (read) indexes, a pretty good explanation on finding out whether your indexes add anything to the query plans can be found here:
https://docs.mongodb.com/v3.4/tutorial/analyze-query-plan/
I hope this helps.