MongoDB Correct Schema for aggregated data

2019-01-29 08:14发布

问题:

I have a big collection that holds lots of stats, since I want to generate reports, I am running a daily cron which aggregates data from the main collection to a smaller one. the thing is: what is the correct way to store all the aggregated data?

Method 1:

{
    'Y2015': {
        'M04': {
            'D18': 100,
            'D19': 200
        }
    },
    'order_id': 'VjprK',
    'user_id': '777'
}

{
    'Y2015': {
        'M04': {
            'D18': 100,
            'D19': 20
        }
    },
    'order_id': 'LaOPX',
    'user_id': '777'
}

{
    'Y2015': {
        'M04': {
            'D18': 100,
            'D19': 50
        }
    },
    'order_id': 'cYwxf',
    'user_id': '777'
}

Method 2:

{
    'order_id': 'VjprK',
    'user_id': '777',
    data {
        'MongoDate(2015-04-18)' : 100,
        'MongoDate(2015-04-19)' : 200,
        'MongoDate(2015-04-20)' : 300,
        'MongoDate(2015-04-21)' : 400,
    }
}

In the future, I want to query by date ranges, so it feels like method 2.

Any recommendations?

回答1:

Would recommend further restructuring the schema in Method 2 to follow this schema:

/* 0 */
{
    "_id" : ObjectId("5577fd322ab13c8cacdd0e70"),
    "order_id" : "VjprK",
    "user_id" : "777",
    "data" : [ 
        {
            "order_date" : ISODate("2015-04-18T08:57:42.514Z"),
            "amount" : 100
        }, 
        {
            "order_date" : ISODate("2015-04-19T08:57:42.514Z"),
            "amount" : 200
        }, 
        {
            "order_date" : ISODate("2015-04-20T08:57:42.514Z"),
            "amount" : 300
        }, 
        {
            "order_date" : ISODate("2015-04-21T08:57:42.514Z"),
            "amount" : 400
        }
    ]
}

which you can then aggregate with a given date range, say from 2015-04-18 to 2015-04-19. Consider the following pipeline:

var start = new Date(2015, 3, 18),
    end = new Date(2015, 3, 20);

db.orders.aggregate([
    {
        "$match": {
            "user_id": "777",
            "data.order_date": {
                "$gte": start,
                "$lt": end
            }
        }
    },
    {
        "$unwind": "$data"
    },
    {
        "$match": {
            "data.order_date": {
                "$gte": start,
                "$lt": end
            }
        }
    },
    {
        "$group": {
            "_id": "$user_id",
            "total": {
                "$sum": "$data.amount"
            }
        }
    }    
])

Sample Output

/* 0 */
{
    "result" : [ 
        {
            "_id" : "777",
            "total" : 300
        }
    ],
    "ok" : 1
}


回答2:

Personally, seems it looks like these are "delivery dates" for parts of an order I would do this:

{
    'order_id': 'LaOPX',
    'user_id': '777',
    'parts': [
        { "date": ISODate("2015-04-18T00:00:00Z"), "qty": 100 },
        { "date": ISODate("2015-04-19T00:00:00Z"), "qty": 20 }
    ]
}

Where the dates where "actual date objects" in the database . If you wanted everything for all of user "777" data in all records then you can do:

db.collection.aggregate([
    // Match the user between dates
    { "$match": { 
        "user_id": "777", 
        "parts.date": { 
            "$gte": new Date("2015-04-18"), "$lt": new Date("2015-04-20")
        }
    }},

    // Unwind the array entries
    { "$unwind": "$parts" },

    // Filter the required dates
    { "$match": { 
        "parts.date": { 
            "$gte": new Date("2015-04-18"), "$lt": new Date("2015-04-20")
        }
    }},

    // Group per user
    { "$group": {
        "_id": "$user_id",
        "total": { "$sum": "$parts.qty" }
    }}
])

It's much more flexible to use real dates in the data as range queries will always work as they should



回答3:

I'm working with time series and MongoDB, and I use a schema based on this.

{
  timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
  type: “memory_used”,
  values: {
    0: 999999,
    …  
    37: 1000000,
    38: 1500000,
    … 
    59: 2000000
  }
}

It's interesting to watch this video too.



回答4:

This schema simplify a lot range queries and array is a common way of store these data series.

{
    'order_id': 'VjprK',
    'user_id': '777',
    'data': [
        {
            date: MongoDate(2015-04-18),
            value: 100
        },
        {
            date: MongoDate(2015-04-19),
            value: 200
        }
        ...
    ]
}


标签: mongodb