How to calculate difference between metrics in dif

2019-09-17 07:03发布

问题:

I want to calculate the difference of nested aggregations between two dates.

To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1?

The aggregation query request looks like this:

 {
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "date": [
                  "2014-08-18 00:00:00.0",
                  "2014-08-15 00:00:00.0"
                ]
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "date_1": {
      "filter": {
        "terms": {
          "date": [
            "2014-08-18 00:00:00.0"
          ]
        }
      },
      "aggs": {
        "my_agg_1": {
          "terms": {
            "field": "field_1",
            "size": 2147483647,
            "order": {
              "_term": "desc"
            }
          },
          "aggs": {
            "my_agg_2": {
              "terms": {
                "field": "field_2",
                "size": 2147483647,
                "order": {
                  "_term": "desc"
                }
              },
              "aggs": {
                "my_agg_3": {
                  "sum": {
                    "field": "field_3"
                  }
                }
              }
            }
          }
        }
      }
    },
    "date_2": {
      "filter": {
        "terms": {
          "date": [
            "2014-08-15 00:00:00.0"
          ]
        }
      },
      "aggs": {
        "my_agg_1": {
          "terms": {
            "field": "field_1",
            "size": 2147483647,
            "order": {
              "_term": "desc"
            }
          },
          "aggs": {
            "my_agg_1": {
              "terms": {
                "field": "field_2",
                "size": 2147483647,
                "order": {
                  "_term": "desc"
                }
              },
              "aggs": {
                "my_agg_3": {
                  "sum": {
                    "field": "field_3"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

And the response looks like this:

{
  "took": 236,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 1646,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "date_1": {
      "doc_count": 823,
      "field_1": {
        "buckets": [
          {
            "key": "field_1_key_1",
            "doc_count": 719,
            "field_2": {
              "buckets": [
                {
                  "key": "key_1",
                  "doc_count": 275,
                  "field_3": {
                    "value": 100
                  }
                }
              ]
            }
          }
        ]
      }
    },
    "date_2": {
      "doc_count": 823,
      "field_1": {
        "buckets": [
          {
            "key": "field_1_key_1",
            "doc_count": 719,
            "field_2": {
              "buckets": [
                {
                  "key": "key_1",
                  "doc_count": 275,
                  "field_3": {
                    "value": 80
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

Thank you.

回答1:

No arithmetic operations are allowed between two aggregations' result from elasticsearch DSL, not even using scripts. (Upto version 1.1.1, at least I know)

Such operations need to be handeled in client side after processing the aggs result.

Reference

elasticsearch aggregation to sort by ratio of aggregations



回答2:

With elasticsearch new version (eg: 5.6.9) is possible:

{
  "size": 0,
    "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "filter": [
            {
              "range": {
                "date_created": {
                  "gte": "2018-06-16T00:00:00+02:00",
                  "lte": "2018-06-16T23:59:59+02:00"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "by_millisec": {
      "range" : {
        "script" : {
          "lang": "painless",
            "source": "doc['date_delivered'][0] - doc['date_created'][0]"
        },
        "ranges" : [
          { "key": "<1sec", "to": 1000.0 },
          { "key": "1-5sec", "from": 1000.0, "to": 5000.0 },
          { "key": "5-30sec", "from": 5000.0, "to": 30000.0 },
          { "key": "30-60sec", "from": 30000.0, "to": 60000.0 },
          { "key": "1-2min", "from": 60000.0, "to": 120000.0 },
          { "key": "2-5min", "from": 120000.0, "to": 300000.0 },
          { "key": "5-10min", "from": 300000.0, "to": 600000.0 },
          { "key": ">10min", "from": 600000.0 }
        ]
      }
    }
  }
}


回答3:

In 1.0.1 I couldn't find anything but in 1.4.2 you could try scripted_metric aggregation (still experimental).

Here are the scripted_metric documentation page

I am not good with the elasticsearch syntax but I think your metric inputs would be:

init_script- just initialize a accumulator for each date:

"init_script": "_agg.d1Val = 0; _agg.d2Val = 0;"

map_script- test the date of the document and add to the right accumulator:

"map_script": "if (doc.date == firstDate) { _agg.d1Val += doc.field_3; } else { _agg.d2Val = doc.field_3;};",

reduce_script - accumulate intermediate data from various shards and return the final results:

"reduce_script": "totalD1 = 0; totalD2 = 0; for (agg in _aggs) {  totalD1 += agg.d1Val ; totalD2 += agg.d2Val ;}; return totalD1 - totalD2"

I don't think that in this case you need a combine_script.

If course, if you can't use 1.4.2 than this is no help :-)