I am trying to obtain the weighted average by aggregating a nested list.
Each document has details of a single student, and the subjects vary across each student and each subject has different weights.
I am trying to calculate the weighted average subject-wise.
My documents are of the form -
[{'class': '10th',
'id': '1',
'subject': [{'marks': 60, 'name': 's1', 'weight': 30},
{'marks': 80, 'name': 's2', 'weight': 70}]},
{'class': '11th',
'id': '2',
'subject': [{'marks': 43, 'name': 's10', 'weight': 40},
{'marks': 54, 'name': 's20', 'weight': 60}]},
{'class': '10th',
'id': '3',
'subject': [{'marks': 43, 'name': 's1', 'weight': 20},
{'marks': 54, 'name': 's20', 'weight': 80}]},
{'class': '10th',
'id': '4',
'subject': [{'marks': 69, 'name': 's10', 'weight': 30},
{'marks': 45, 'name': 's2', 'weight': 70}]}]
Here s1,s10,s2,s20 are the subjects. For a given class, say "10th" I am trying to aggregate the weighted average.
The query I make is
GET students_try/_search
{
"query": {
"match": {
"class": "10th"
}
},
"aggs": {
"subjects": {
"nested": {
"path": "subject"
},
"aggs": {
"subjects": {
"terms": {
"field": "subject.name"
},
"aggs": {
"avg_score": {
"avg": {
"field": "subject.marks"
}
},
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "subject.marks"
},
"weight": {
"field": "subject.weight"
}
}
}
}
}
}
}
},
"size": 0
}
The error I get is -
{u'error': {u'col': 211,
u'line': 1,
u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
u'root_cause': [{u'col': 211,
u'line': 1,
u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
u'type': u'unknown_named_object_exception'}],
u'type': u'unknown_named_object_exception'},
u'status': 400}
I am not sure what is causing the error.
Yes the weighted average as mentioned by Nishant only appears post 6.4 as mentioned in the section A few others
in this link detailing 6.4 release
However I've come up with the below script using Bucket Script Aggregation which calculates the weighted avg for each and every bucket :
POST <your_index_name>/_search
{
"size": 0,
"query": {
"match": {
"class": "10th"
}
},
"aggs": {
"subjects": {
"nested": {
"path": "subject"
},
"aggs": {
"subjects": {
"terms": {
"field": "subject.name.keyword"
},
"aggs": {
"avg_score": {
"avg": {
"field": "subject.marks"
}
},
"sum_score":{
"sum_productOfMarksAndWeight": {
"script": "doc['subject.marks'].value * doc['subject.weight'].value"
}
},
"sum_weights": {
"sum": {
"field": "subject.weight"
}
},
"weighted_avg":{
"bucket_script": {
"buckets_path": {
"sumScore": "sum_productOfMarksAndWeight",
"sumWeights": "sum_weights"
},
"script": "params.sumScore/params.sumWeights"
}
}
}
}
}
}
}
}
If you look at the above aggregation carefully, for every bucket I've calculated the sum of weights
and sum of product of weights and marks
using Sum Aggregation and then I've used these two aggregations to calculate the weighted aggregation.
Below is how your response appears. Notice that there is a caveat that you'd also see the sum of weights
and sum of product of weights and marks
in the aggregation result.
Response
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"subjects": {
"doc_count": 6,
"subjects": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "s1",
"doc_count": 2,
"sum_weights": {
"value": 50
},
"sum_productOfMarksAndWeight": {
"value": 2660
},
"avg_score": {
"value": 51.5
},
"weighted_avg": {
"value": 53.2
}
},
{
"key": "s2",
"doc_count": 2,
"sum_weights": {
"value": 140
},
"sum_productOfMarksAndWeight": {
"value": 8750
},
"avg_score": {
"value": 62.5
},
"weighted_avg": {
"value": 62.5
}
},
{
"key": "s10",
"doc_count": 1,
"sum_weights": {
"value": 30
},
"sum_productOfMarksAndWeight": {
"value": 2070
},
"avg_score": {
"value": 69
},
"weighted_avg": {
"value": 69
}
},
{
"key": "s20",
"doc_count": 1,
"sum_weights": {
"value": 80
},
"sum_productOfMarksAndWeight": {
"value": 4320
},
"avg_score": {
"value": 54
},
"weighted_avg": {
"value": 54
}
}
]
}
}
}
}
I hope this helps, let me know if it doesn't and if you think this solves what you are looking for, please go ahead and accept this answer ;-)