Read this, and tried experimenting it.
Below is my schema.
SCHEMA:
{
"s" : "CB",
"c" : "REQ_RCV",
"e" : "sms_click",
"st" : "i",
"b" : "2",
"a" : "1",
"u" : "b1_h1_d1_m1_user_2",
"c#" : "b1_h1_d1_m1_cr-2",
"@" : ISODate("2016-10-01T06:03:00.000Z"), //Indexed
"@h" : "16100106", //hourly bucket
"@d" : "161001", //Indexed
"@m" : "1610"
}
And below is the explain plan:
> 2017-01-22T13:43:47.764+0530 I COMMAND [conn34] command test-analytics.template3 appName: "MongoDB Shell" command: aggregate {
> aggregate: "template3", pipeline: [ { $match: { @: { $gte: new
> Date(1483228800000), $lte: new Date(1483315199000) } } }, { $group: {
> _id: { b: "$b", HOURLY: "$@h", s: "$s" }, count: { $sum: 1.0 } } }, { $project: { _id: 0.0, BUCKET: "$_id.b", SERVICE: "$_id.s", TIME:
> "$_id.HOURLY", count: 1.0 } }, { $sort: { SERVICE: 1.0, BUCKET: 1.0,
> TIME: 1.0 } } ], cursor: {} } planSummary: IXSCAN { @: 1.0 }
> keysExamined:106888 docsExamined:106888 hasSortStage:1
> cursorExhausted:1 numYields:925 nreturned:96 reslen:7095 locks:{
> Global: { acquireCount: { r: 1860 } }, Database: { acquireCount: { r:
> 930 } }, Collection: { acquireCount: { r: 929 } } }
> protocol:op_command **3499ms**
> 2017-01-22T13:44:24.825+0530 I COMMAND [conn34] command test-analytics.template3 appName: "MongoDB Shell" command: aggregate {
> aggregate: "template3", pipeline: [ { $match: { @d: "170101" } }, {
> $group: { _id: { b: "$b", HOURLY: "$@h", s: "$s" }, count: { $sum: 1.0
> } } }, { $project: { _id: 0.0, BUCKET: "$_id.b", SERVICE: "$_id.s",
> TIME: "$_id.HOURLY", count: 1.0 } }, { $sort: { SERVICE: 1.0, BUCKET:
> 1.0, TIME: 1.0 } } ], cursor: {} } planSummary: IXSCAN { @d: 1.0 } keysExamined:106888 docsExamined:106888 hasSortStage:1
> cursorExhausted:1 numYields:865 nreturned:96 reslen:7095 locks:{
> Global: { acquireCount: { r: 1740 } }, Database: { acquireCount: { r:
> 870 } }, Collection: { acquireCount: { r: 869 } } }
> protocol:op_command **1294ms**
Questions:
- Though both the queries are examined same number of documents Why there is time difference in the output?
- Are
$lte
,$gte
operators slow only on date ranges or even slow on numeric comparison as well? - Since bucketing gives faster response, how do I use bucketing for range queries? I can make multiple aggregate time bucket calls to support range queries, but that will make more round trip time, any suggestions?
- Is it possible to use
$and
for two$match
in a aggregate query to support range bucketing? Currently aggregate accepts multiple$match
but first output of$match
is given to second$match
, but what I want is adding/grouping of individual$match
results to the next pipeline.
Possible answer for Q4:
db.template3.aggregate([
{
$match: {
$or: [
{"@d":"170301"},
{"@d":"170306"},
{"@d":"170202"},
{"@d":"170303"},
{"@d":"170304"},
{"@d":"170305"}
]
}
},
{ $project: { _id: 0, "b": 1, "s": 1, "@h": 1 } },
{
$group: {
_id: {"b": "$b", "HOURLY": "$@h", "s": "$s" },
count: { $sum: 1 }
}
},
{
$project: {
_id: 0,
"BUCKET": "$_id.b",
"SERVICE": "$_id.s",
"TIME": "$_id.HOURLY",
count: 1
}
},
{ $sort: { "SERVICE": 1, "BUCKET": 1, "TIME": 1 } }
]);
In this answer we can use mixes (daily and monthly) of buckets and still this will use its own index. Read https://docs.mongodb.com/manual/reference/operator/query/or/#or-clauses-and-indexes.
Sample query:
db.template3.aggregate([
{$match:{"@h":{$gte : 17020511, $lte : 17030511}, "st":"i"}},
{$project : {"_id":0, "@h":1,"c":1, "@m":1}},
{$group:{_id:{ "HOURLY":"$@h", "c":"$c"}, count:{$sum:1}}},
{$project : {_id:0, "COUNTER":"$_id.c","TIME":"$_id.HOURLY", count:1}},
{$sort:{"COUNTER":1,"TIME":1}}
]);
Output:
{ "count" : 2255, "COUNTER" : "REQ_RCVD", "TIME" : 17020511 }
{ "count" : 28888, "COUNTER" : "REQ_RCVD", "TIME" : 17020600 }
{ "count" : 37613, "COUNTER" : "REQ_RCVD", "TIME" : 17020601 }
{ "count" : 6723, "COUNTER" : "REQ_RCVD", "TIME" : 17020602 }
{ "count" : 14057, "COUNTER" : "REQ_RCVD", "TIME" : 17020603 }
{ "count" : 12405, "COUNTER" : "REQ_RCVD", "TIME" : 17020604 }
{ "count" : 2392, "COUNTER" : "REQ_RCVD", "TIME" : 17020611 }
{ "count" : 28784, "COUNTER" : "REQ_RCVD", "TIME" : 17020700 }
{ "count" : 37494, "COUNTER" : "REQ_RCVD", "TIME" : 17020701 }
{ "count" : 6697, "COUNTER" : "REQ_RCVD", "TIME" : 17020702 }
{ "count" : 13930, "COUNTER" : "REQ_RCVD", "TIME" : 17020703 }
{ "count" : 12493, "COUNTER" : "REQ_RCVD", "TIME" : 17020704 }
{ "count" : 2225, "COUNTER" : "REQ_RCVD", "TIME" : 17020711 }
{ "count" : 28821, "COUNTER" : "REQ_RCVD", "TIME" : 17020800 }
{ "count" : 37949, "COUNTER" : "REQ_RCVD", "TIME" : 17020801 }
{ "count" : 6676, "COUNTER" : "REQ_RCVD", "TIME" : 17020802 }
{ "count" : 14039, "COUNTER" : "REQ_RCVD", "TIME" : 17020803 }
{ "count" : 12349, "COUNTER" : "REQ_RCVD", "TIME" : 17020804 }
{ "count" : 2332, "COUNTER" : "REQ_RCVD", "TIME" : 17020811 }
{ "count" : 28379, "COUNTER" : "REQ_RCVD", "TIME" : 17020900 }
OPTIMIZATION
I felt more time is taken because of reading of non indexed fields. hence
docsExamined: 106888
There is somehow an improvement when I use
$project
before$group
I have changed
"@h"
data type fromString
toInteger
(NumberInt
), I think it will improve some more.