Weighted Average for nested aggregation in Elastic

2019-08-20 16:41发布

问题:

I am trying to obtain the weighted average by aggregating a nested list. Each document has details of a single student, and the subjects vary across each student and each subject has different weights.

I am trying to calculate the weighted average subject-wise.

My documents are of the form -

[{'class': '10th',
 'id': '1',
'subject': [{'marks': 60, 'name': 's1', 'weight': 30},
         {'marks': 80, 'name': 's2', 'weight': 70}]},
{'class': '11th',
 'id': '2',
'subject': [{'marks': 43, 'name': 's10', 'weight': 40},
         {'marks': 54, 'name': 's20', 'weight': 60}]},
{'class': '10th',
 'id': '3',
'subject': [{'marks': 43, 'name': 's1', 'weight': 20},
         {'marks': 54, 'name': 's20', 'weight': 80}]},
{'class': '10th',
 'id': '4',
'subject': [{'marks': 69, 'name': 's10', 'weight': 30},
         {'marks': 45, 'name': 's2', 'weight': 70}]}]

Here s1,s10,s2,s20 are the subjects. For a given class, say "10th" I am trying to aggregate the weighted average.

The query I make is

GET students_try/_search
{
 "query": {
"match": {
  "class": "10th"
}
 },
"aggs": {
"subjects": {
  "nested": {
    "path": "subject"
  },
  "aggs": {
    "subjects": {
      "terms": {
        "field": "subject.name"
      },
      "aggs": {
        "avg_score": {
          "avg": {
            "field": "subject.marks"
          }
        },
        "weighted_grade": {
          "weighted_avg": {
            "value": {
              "field": "subject.marks"
            },
            "weight": {
              "field": "subject.weight"
            }
          }
        }
      }
    }
  }
}
  },
 "size": 0
}

The error I get is -

{u'error': {u'col': 211,
        u'line': 1,
        u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
        u'root_cause': [{u'col': 211,
                         u'line': 1,
                         u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
                         u'type': u'unknown_named_object_exception'}],
        u'type': u'unknown_named_object_exception'},
 u'status': 400}

I am not sure what is causing the error.

回答1:

Yes the weighted average as mentioned by Nishant only appears post 6.4 as mentioned in the section A few others in this link detailing 6.4 release

However I've come up with the below script using Bucket Script Aggregation which calculates the weighted avg for each and every bucket :

POST <your_index_name>/_search
{
  "size": 0,
  "query": {
    "match": {
      "class": "10th"
    }
  },
  "aggs": {
    "subjects": {
      "nested": {
        "path": "subject"
      },
      "aggs": {
        "subjects": {
          "terms": {
            "field": "subject.name.keyword"
          },
          "aggs": {
            "avg_score": {
              "avg": {
                "field": "subject.marks"
              }
            },
            "sum_score":{
              "sum_productOfMarksAndWeight": {
                "script": "doc['subject.marks'].value * doc['subject.weight'].value"
              }
            },
            "sum_weights": {
              "sum": {
                "field": "subject.weight"
              }
            },
            "weighted_avg":{
              "bucket_script": {
                "buckets_path": {
                  "sumScore": "sum_productOfMarksAndWeight",
                  "sumWeights": "sum_weights"
                },
                "script": "params.sumScore/params.sumWeights"
              }
            }
          }             
        }
      }
    }
  }
}

If you look at the above aggregation carefully, for every bucket I've calculated the sum of weights and sum of product of weights and marks using Sum Aggregation and then I've used these two aggregations to calculate the weighted aggregation.

Below is how your response appears. Notice that there is a caveat that you'd also see the sum of weights and sum of product of weights and marks in the aggregation result.

Response

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "subjects": {
      "doc_count": 6,
      "subjects": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "s1",
            "doc_count": 2,
            "sum_weights": {
              "value": 50
            },
            "sum_productOfMarksAndWeight": {
              "value": 2660
            },
            "avg_score": {
              "value": 51.5
            },
            "weighted_avg": {
              "value": 53.2
            }
          },
          {
            "key": "s2",
            "doc_count": 2,
            "sum_weights": {
              "value": 140
            },
            "sum_productOfMarksAndWeight": {
              "value": 8750
            },
            "avg_score": {
              "value": 62.5
            },
            "weighted_avg": {
              "value": 62.5
            }
          },
          {
            "key": "s10",
            "doc_count": 1,
            "sum_weights": {
              "value": 30
            },
            "sum_productOfMarksAndWeight": {
              "value": 2070
            },
            "avg_score": {
              "value": 69
            },
            "weighted_avg": {
              "value": 69
            }
          },
          {
            "key": "s20",
            "doc_count": 1,
            "sum_weights": {
              "value": 80
            },
            "sum_productOfMarksAndWeight": {
              "value": 4320
            },
            "avg_score": {
              "value": 54
            },
            "weighted_avg": {
              "value": 54
            }
          }
        ]
      }
    }
  }
}

I hope this helps, let me know if it doesn't and if you think this solves what you are looking for, please go ahead and accept this answer ;-)