Are there any tools to estimate index size in Mong

2019-01-13 20:05发布

问题:

I'm looking for a tool to get a decent estimate of how large a MongoDB index will be based on a few signals like:

  • How many documents in my collection
  • The size of the indexed field(s)
  • The size of the _id I'm using if not ObjectId
  • Geo/Non-geo

Has anyone stumbled across something like this? I can imagine it would be extremely useful given Mongo's performance degradation once it hits the memory wall and documents start getting paged out to disk. If I have a functioning database and want to add another index, the only way I'll know if it will be too big is to actually add it.

It wouldn't need to be accurate down to the bit, but with some assumptions about B-Trees and the index implementation I'm sure it could be reasonable enough to be helpful.

If this doesn't exist already I'd like to build and open source it, so if I've missed any required parameters for this calculation please include in your answer.

回答1:

I just spoke with some of the 10gen engineers and there isn't a tool but you can do a back of the envelope calculation that is based on this formula:

2 * [ n * ( 18 bytes overhead + avg size of indexed field + 5 or so bytes of conversion fudge factor ) ]

Where n is the number of documents you have.

The overhead and conversion padding are mongo specific but the 2x comes from the b-tree data structure being roughly half full (but having allocated 100% of the space a full tree would require) in the worst case.

I'd explain more but I'm learning about it myself at the moment. This presentation will have more details: http://www.10gen.com/presentations/mongosp-2011/mongodb-internals



回答2:

You can check the sizes of the indexes on a collection by using command:

db.collection.stats()

More details here: http://docs.mongodb.org/manual/reference/method/db.collection.stats/#db.collection.stats



回答3:

Another way to calculate is to ingest ~1000 or so documents into every collection, in other words, build a small scale model of what you're going to end up with in production, create indexes or what have you and calculate the final numbers based on db.collection.stats() average.

Does this make sense? :)