Ridiculously slow mongoDB query on small collectio

So I have a super simple database in mongoDB with a few collections:

> show collections
Aggregates <-- count: 92
Users <-- count: 68222
Pages <-- count: 1728288847, about 1.1TB
system.indexes

The Aggregates collection is an aggregate of the Pages collection, and each document looks like this:

> db.Aggregates.findOne()
{
        "_id" : ObjectId("50f237126ba71610eab3aaa5"),
        "daily_total_pages" : 16929799,
        "day" : 21,
        "month" : 9,
        "year" : 2011
}

Very simple. However, let's try and get the total page loads by adding all 92 days daily page loads together:

>>> def get_total():
...     start = datetime.now()
...     print sum([x['daily_total_pages'] for x in c.Aggregates.find()])
...     end = datetime.now()
...     print (end-start).seconds
...
>>> get_total()
1728288847
43

43 seconds?!??!??!?!

Those 92 aggregate results are tiny! I might as well just store them in a text file, this is crazy.

Or are they tiny? How big are they on disk according to mongo?

> db.Aggregates.stats()
{
        "ns" : "c.AggregateResults",
        "count" : 92,
        "size" : 460250104,
        "avgObjSize" : 5002718.521739131,
        "storageSize" : 729464832,
        "numExtents" : 7,
        "nindexes" : 2,
        "lastExtentSize" : 355647488,
        "paddingFactor" : 1.0690000000000066,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 16352,
        "indexSizes" : {
                "_id_" : 8176,
                "date_1" : 8176
        },
        "ok" : 1
}

438 megabytes total for those tiny daily numbers? Each one is approximately 280 bytes, so they should be a maximum of 25~30kb total. So the storage is huge and the query is super slow. Is it possible it could be fragmented on disk? I created the aggregates after inserting the documents into the the full Pages collection.

Anyone have any insights into this madness? :O

Edit: Solved with more specific find() query by Jared. The video below that Sammaye provided also gives some very interesting storage insights.

Edit 2: So I found out that using sys.getsizeof() is a really unreliable method of finding out your document's sizes since it doesn't recurse down any trees. So actually my docs were quite large, and the best thing to do was to use find({}, {'daily_page_loads'}) as a more specific query!

标签： mongodb optimization query-optimization pymongo database

1条回答

成全新的幸福

2楼-- · 2020-06-04 02:59

The avgObjSize is out of line with the 280 byte estimate. It's saying your objects are averaging around 5MB and storageSize is near 1GB. If you're memory-constrained running a query that needs to access all 1GB of file would cause lots of page faults.

Have you tried compacting?

db.runCommand({compact: 'Aggregates'})

or repairing?

db.repairDatabase()

If that doesn't work try pulling back just those fields needed for the sum rather than pulling the whole document. It may be that those documents are actually 5MB and the time is spent pulling data over the wire.

def get_total():
    start = datetime.now()
    print sum([x['daily_total_pages'] for x in c.Aggregates.find({}, {"daily_total_pages": 1})])
    end = datetime.now()
    print (end-start).seconds

0人赞添加讨论(0) 举报

Ridiculously slow mongoDB query on small collectio

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间