Elasticsearch - group by day of week and hour

2019-04-08 18:23发布

问题:

I need to do get some data grouped by day of week and hour, for example

curl -XGET http://localhost:9200/testing/hello/_search?pretty=true -d '
{
        "size": 0,
        "aggs": {
          "articles_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "hour",
                "format": "E - k"
            }
          }
        }
}
'

Gives me this:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2857,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "articles_over_time" : {
      "buckets" : [ {
        "key_as_string" : "Fri - 17",
        "key" : 1391792400000,
        "doc_count" : 6
      },
     ...
      {
        "key_as_string" : "Wed - 22",
        "key" : 1411596000000,
        "doc_count" : 1
      }, {
        "key_as_string" : "Wed - 22",
        "key" : 1411632000000,
        "doc_count" : 1
      } ]
    }
  }
}

Now I need to summarize doc counts by this value "Wed - 22", how can I do this? Maybe some another approach?

回答1:

The same kind of problem has been solved in this thread.

Adapting the solution to your problem, we need to make a script to convert the date into the hour of day and day of week:

Date date = new Date(doc['date'].value) ; 
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE, HH');
format.format(date)

And use it in a query:

{
    "aggs": {
        "perWeekDay": {
            "terms": {
                "script": "Date date = new Date(doc['date'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE, HH');format.format(date)"
            }
        }
    }
}


回答2:

You can try doing terms aggregation on "key_as_string" field from the aggregation results using sub aggregation.

Hope that helps.



回答3:

This is because you are using an interval of 'hour', but, the date format is 'day' (E - k).

Change your interval to be 'day', and you'll no longer get separate buckets for 'Weds - 22'.

Or, if you do want per hour, then change your format to include the hour field.