可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a big collection that holds lots of stats, since I want to generate reports, I am running a daily cron which aggregates data from the main collection to a smaller one. the thing is: what is the correct way to store all the aggregated data?
Method 1:
{
'Y2015': {
'M04': {
'D18': 100,
'D19': 200
}
},
'order_id': 'VjprK',
'user_id': '777'
}
{
'Y2015': {
'M04': {
'D18': 100,
'D19': 20
}
},
'order_id': 'LaOPX',
'user_id': '777'
}
{
'Y2015': {
'M04': {
'D18': 100,
'D19': 50
}
},
'order_id': 'cYwxf',
'user_id': '777'
}
Method 2:
{
'order_id': 'VjprK',
'user_id': '777',
data {
'MongoDate(2015-04-18)' : 100,
'MongoDate(2015-04-19)' : 200,
'MongoDate(2015-04-20)' : 300,
'MongoDate(2015-04-21)' : 400,
}
}
In the future, I want to query by date ranges, so it feels like method 2.
Any recommendations?
回答1:
Would recommend further restructuring the schema in Method 2 to follow this schema:
/* 0 */
{
"_id" : ObjectId("5577fd322ab13c8cacdd0e70"),
"order_id" : "VjprK",
"user_id" : "777",
"data" : [
{
"order_date" : ISODate("2015-04-18T08:57:42.514Z"),
"amount" : 100
},
{
"order_date" : ISODate("2015-04-19T08:57:42.514Z"),
"amount" : 200
},
{
"order_date" : ISODate("2015-04-20T08:57:42.514Z"),
"amount" : 300
},
{
"order_date" : ISODate("2015-04-21T08:57:42.514Z"),
"amount" : 400
}
]
}
which you can then aggregate with a given date range, say from 2015-04-18
to 2015-04-19
. Consider the following pipeline:
var start = new Date(2015, 3, 18),
end = new Date(2015, 3, 20);
db.orders.aggregate([
{
"$match": {
"user_id": "777",
"data.order_date": {
"$gte": start,
"$lt": end
}
}
},
{
"$unwind": "$data"
},
{
"$match": {
"data.order_date": {
"$gte": start,
"$lt": end
}
}
},
{
"$group": {
"_id": "$user_id",
"total": {
"$sum": "$data.amount"
}
}
}
])
Sample Output
/* 0 */
{
"result" : [
{
"_id" : "777",
"total" : 300
}
],
"ok" : 1
}
回答2:
Personally, seems it looks like these are "delivery dates" for parts of an order I would do this:
{
'order_id': 'LaOPX',
'user_id': '777',
'parts': [
{ "date": ISODate("2015-04-18T00:00:00Z"), "qty": 100 },
{ "date": ISODate("2015-04-19T00:00:00Z"), "qty": 20 }
]
}
Where the dates where "actual date objects" in the database . If you wanted everything for all of user "777" data in all records then you can do:
db.collection.aggregate([
// Match the user between dates
{ "$match": {
"user_id": "777",
"parts.date": {
"$gte": new Date("2015-04-18"), "$lt": new Date("2015-04-20")
}
}},
// Unwind the array entries
{ "$unwind": "$parts" },
// Filter the required dates
{ "$match": {
"parts.date": {
"$gte": new Date("2015-04-18"), "$lt": new Date("2015-04-20")
}
}},
// Group per user
{ "$group": {
"_id": "$user_id",
"total": { "$sum": "$parts.qty" }
}}
])
It's much more flexible to use real dates in the data as range queries will always work as they should
回答3:
I'm working with time series and MongoDB, and I use a schema based on this.
{
timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
type: “memory_used”,
values: {
0: 999999,
…
37: 1000000,
38: 1500000,
…
59: 2000000
}
}
It's interesting to watch this video too.
回答4:
This schema simplify a lot range queries and array is a common way of store these data series.
{
'order_id': 'VjprK',
'user_id': '777',
'data': [
{
date: MongoDate(2015-04-18),
value: 100
},
{
date: MongoDate(2015-04-19),
value: 200
}
...
]
}