I don't understand the difference between create_index
and ensure_index
in pymongo. On the MongoDB indexes page, it says
you can create an index by calling the
ensureIndex()
However in pymongo there are two different commands create_index
and ensure_index
, and the documentation for create index has:
Unlike create_index(), which attempts
to create an index unconditionally,
ensure_index() takes advantage of some
caching within the driver such that it
only attempts to create indexes that
might not already exist. When an index
is created (or ensured) by PyMongo it
is “remembered” for ttl seconds.
Repeated calls to ensure_index()
within that time limit will be
lightweight - they will not attempt to
actually create the index.
Am I right in understanding that ensure_index
will create a permanent index, or do I need to use create_index
for this?
Keep in mind that in Mongo 3.x ensureIndex is deprecated and should be discouraged.
Deprecated since version 3.0.0: db.collection.ensureIndex() is now an alias for db.collection.createIndex().
The same is in pymongo:
DEPRECATED - Ensures that an index exists on this collection.
Which means that you should always use create_index
.
@andreas-jung is right in that ensure_index()
is a wrapper over create_index()
, I think the confusion arises with the phrase:
When an index is created (or ensured)
by PyMongo it is “remembered” for ttl
seconds.
It's not that the index is temporary or "transient", what happens is that during the specified amount of seconds, a call to ensure_index()
trying to create the same index again will not have any effect and will not call create_index()
underneath, but after that "cache" expires, a call to ensure_index()
will again call create_index()
underneath.
I perfectly understand your confusion because quite frankly PyMongo's docs don't make a very good job at explaining how this works, but if you head over to the Ruby docs, the explanation is a little clearer:
- (String) ensure_index(spec, opts = {})
Calls create_index and sets a flag to
not do so again for another X minutes.
this time can be specified as an
option when initializing a Mongo::DB
object as options[:cache_time] Any
changes to an index will be propogated
through regardless of cache time
(e.g., a change of index direction)
The parameters and options for this
methods are the same as those for
Collection#create_index.
Examples:
Call sequence:
Time t: @posts.ensure_index([['subject', Mongo::ASCENDING]) -- calls create_index and sets the 5 minute cache
Time t+2min : @posts.ensure_index([['subject', Mongo::ASCENDING]) -- doesn't do anything
Time t+3min : @posts.ensure_index([['something_else', Mongo::ASCENDING]) -- calls create_index and sets 5 minute cache
Time t+10min : @posts.ensure_index([['subject', Mongo::ASCENDING]) -- calls create_index and resets the 5 minute counter
I'm not claiming drivers work exactly the same, it's just that for illustration purposes their explanation is a little better IMHO.
The ensureIndex
method in the Interactive Shell and ensure_index
in the python driver are different things, although the same word is used. Both the create_index
and ensure_index
method from the python driver create an index permanently.
Maybe one would use ensure_index
with a reasonable TTL in such a situation, because I am not sure if create_index
would recreate the index each time you call it. Recreation normally is not desired and it could be a heavy operation. But even ensure_index
(of the python or also ruby driver) could possibly recreate the index whenever the TTL is expired or when you call it from a different client instance or after a restart. I am not sure about this.
Maybe an even better possibility is to first check, using the method index_information()
, if the index already exists. If it already exists you would not create it again.
I am now demonstrating how the term ensure_index
(or ensureIndex
) is used with 2 different meanings:
1) It creates an index if it does not yet exist in the database
This is what the Interactive Shell method ensureIndex()
does:
http://www.mongodb.org/display/DOCS/Indexes#Indexes-Basics
Also the Node.JS MongoDB Driver
behaves this way:
https://github.com/mongodb/node-mongodb-native/blob/master/lib/mongodb/collection.js
(Search for function ensureIndex
in the file collection.js
.)
2) It creates an index if it is not in the 'driver cache'
The same identifier is used with a different meaning here, which I find confusing.
The python and the ruby driver store information in memory about indexes that were created recently, and they call this behaviour 'caching'.
They do not tell the database about this caching.
The result of this mechanism is, if you call create_index
or ensure_index
for the first time with a TTL value (time to live), then the driver will insert the index in the database and will remember this insertion and also store the TTL information in memory. What is cached here is the time and which index it was.
The next time you call ensure_index
with the same index of the same collection on the same driver instance, the ensure_index
command will only insert the index again, if TTL seconds have not yet passed since the first call.
If you call create_index
, the index will always be inserted, no matter how much time passed since the first call, and of course also if this is the first call.
This is the python driver, search for def ensure_index
in the file collection.py
:
https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/collection.py
And the ruby driver, search for def ensure_index
in the file collection.rb
:
https://github.com/mongodb/mongo-ruby-driver/blob/master/lib/mongo/collection.rb
(Note that different client instances do not know about the caching of the others, this information is kept in memory only and it is per instance. If you restart the client application the new instance does not know about the old 'cached' index inserts. Also other clients do not know, they do not tell each other.)
I was not yet able to fully understand, what happens in the db, when the python driver or the ruby driver insert an index that is already there. I would suspect they do nothing in this case, which makes more sense and would also match the behaviour of the Interactive Shell
and the JS driver.
All indexes are permanent.
ensure_index() is just a tiny wrapper around create_index().
"""
The ensureIndex() function only creates the index if it does not exist.
"""
There is nothing like a transient index or a temporary index.
I would recommend creating metaclass and ORM.
From metaclass init call init_schema method for initializing the counters, schema, keys etc.
This way you prevent calling ensure_index every query or collection update :)