I'm trying to count distinct values of multiple fields By one MongoDB Aggregation query.
So here's my data:
{
"car_type": "suv",
"color": "red",
"num_doors": 4
},
{
"car_type": "hatchback",
"color": "blue",
"num_doors": 4
},
{
"car_type": "wagon",
"color": "red",
"num_doors": 4
}
I want a distinct count of each field:
distinct_count_car_type=3
distinct_count_color=2
distinct_count_num_doors=1
I was able to group multiple fields and then do a distinct count but it could only give me a count on the first field. Not all of them.
And also it's a large set of data.
You're looking for the power of ... $objectToArray
!
db.foo.aggregate([
{$project: {x: {$objectToArray: "$$CURRENT"}}}
,{$unwind: "$x"}
,{$match: {"x.k": {$ne: "_id"}}}
,{$group: {_id: "$x.k", y: {$addToSet: "$x.v"}}}
,{$addFields: {size: {"$size":"$y"}} }
]);
This will yield:
{ "_id" : "num_doors", "y" : [ 4 ], "size" : 1 }
{ "_id" : "color", "y" : [ "blue", "red" ], "size" : 2 }
{
"_id" : "car_type",
"y" : [
"wagon",
"hatchback",
"suv"
],
"size" : 3
}
You can $project
or $addFields
as you see fit to include or exclude the set of unique values or the size.
Running the following aggregate pipeline should give you the desired result:
db.collection.aggregate([
{
"$group": {
"_id": null,
"distinct_car_types": { "$addToSet": "$car_type" },
"distinct_colors": { "$addToSet": "$color" },
"distinct_num_doors": { "$addToSet": "$num_doors" }
}
},
{
"$project": {
"distinct_count_car_type": { "$size": "$distinct_car_types" },
"distinct_count_color": { "$size": "$distinct_colors" },
"distinct_count_num_doors": { "$size": "$distinct_num_doors" }
}
}
])