I got dataset as follow :
{
"_id" : ObjectId("592d4f43d69b643ac0cb9148"),
"timestamp" : ISODate("2017-03-01T16:58:00.000Z"),
"Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}
/* 5 */
{
"_id" : ObjectId("592d4f43d69b643ac0cb9149"),
"timestamp" : ISODate("2017-03-01T17:09:00.000Z"),
"Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}
/* 6 */
{
"_id" : ObjectId("592d3a6cd69b643ac0cae395"),
"timestamp" : ISODate("2017-01-30T09:31:00.000Z"),
"Technique-Electrique_Prises de Courant_Power1[W]" : 14.0,
"Technique-Electrique_Eclairage_Power2[W]" : 360.0,
"Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
/* 7 */
{
"_id" : ObjectId("592d3a6cd69b643ac0cae396"),
"timestamp" : ISODate("2017-01-30T09:32:00.000Z"),
"Technique-Electrique_Prises de Courant_Power1[W]" : 15.0,
"Technique-Electrique_Eclairage_Power2[W]" : 365.0,
"Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
There are an "_id", a "timestamp" and multiple sensors fields. The number of sensor is not consistent. With an interface, I chose numerous sensors which I want to include in my query. This choice is stored in a list where each item is the name of the sensor.
example :
self.chosenSensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]
I would like to calculate the average of values of each chosen sensor. I already did it but I do a query for each sensor.
In the following example, I show you that. (Don't consider the date aggregation, it is the next step)
page2.currentColl].aggregate([{"$match":{chosenSensor:{"$exists": True}}}, {"$group":{"_id":{"year":{"$year":"$timestamp"}, "month":{"$month":"$timestamp"}}, "average":{"$avg": chosenSensorAverage}}}])
Result (each average is in a new document):
RDC-ChambreEnfants_CO2_GAS_CONCENTRATION[ppm]
{'_id': {'year': 2017, 'month': 4}, 'average': 1475.3685814315352}
{'_id': {'year': 2017, 'month': 3}, 'average': 1374.3771154414906}
RDC-ChambreEnfants_Humidité_HUMIDITY[%]
{'_id': {'year': 2017, 'month': 4}, 'average': 37.55591753379364}
{'_id': {'year': 2017, 'month': 3}, 'average': 37.459350662153724}
What I would like to get is the following :
{
"Avg_Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Avg_Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Avg_Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
"Avg_Technique-Electrique_Prises de Courant_Power1[W]" : 14.5,
"Avg_Technique-Electrique_Eclairage_Power2[W]" : 362.5,
"Avg_Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Avg_Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Avg_Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
I've been given a hint which is (By Neil Lunn) :
You could possibly make the statement longer and get the "counts" and "sums" for each using $ifNull to determine when to increment. Then you would $divide "after" the $group pipeline stage to get the final "average".
As noted, the "key names" seem the larger problem to me and would probably be better handled by moving them to "values" within elements of an array
The first problem for me is that I don't know how to use my sensors list in the query. Other problems will probably come after this one is resolved.
Outline of Concept
What I was basically saying in the very brief comment is that instead for issuing a separate aggregation query for every sensor "key" name, you can put it in ONE, as long as you calculate the "averages" correctly.
Of course the problem in your data is that the "keys" are not present in all documents. So to get the correct "average", we cannot just use
$avg
since it would count "ALL" documents, whether the key was present or not.So instead we break up the "math", and do a
$group
for the TotalCount
and totalSum
of each key first. This uses$ifNull
to test for the presence of the field, and also$cond
to alternate values to return.The
$cond
operator is a "ternary" operator which means where the first "if" condition istrue
, "then" the second argument is returned, "else" the third argument is returned.So the point of the ternary in the
"Count"
is to work out:After the
$group
is done, in order to get theAverage
we use$divide
on the two numbers produced for each key within a separate$project
stage.The end result is the "average" for every key that you supply, and this considered only adding values and counts for documents where the field was actually present.
So putting all the keys in the one aggregation statement will save you a lot of time and resources on processing.
Dynamic Generation of Pipeline
So to do this "dynamically" in python, start with the list:
Which generates the same as the full listing above for a given list of "sensors".