I've got a bunch of documents indexed in ElasticSearch, and I need to get the following data:
For each month, get the average number of documents per working day of the month (or if impossible, use 20 days as the default).
I already aggregated my data into months buckets using the date histogram
aggregation. I tried to nest a stats
bucket, but this aggregations uses data extracted from the document's field, not from the parent bucket.
Here is my query so far:
{
"query": {
"match_all": {}
},
"aggs": {
"docs_per_month": {
"date_histogram": {
"field": "created_date",
"interval": "month",
"min_doc_count": 0
}
"aggs": {
'???': '???'
}
}
}
}
edit
To make my question clearer, what I need is:
- Get the total of numbers of documents created for the month (which is already done thanks to the
date_histogram
aggregation) - Get the number of working days for the month
- Divide the first by the second.
What you basically need is something like this (which doesn't work, as it's not an available feature):
It doesn't work because there is not way of accessing the
doc_count
from the "parent" aggregation.But, this will be possible in the 2.x branch of Elasticsearch and, at the moment, it's being actively developed: https://github.com/elastic/elasticsearch/issues/8110 This new feature will add a second layer of manipulation over the results (buckets) of an aggregation and it's not only your usecase, but many others.
Unless you want to try some ideas out there or perform your own calculations in your app, you need to wait for this feature.
For anyone still interested, you can now do with with the
avg_bucket
aggregation. Its still a bit tricky, because you cannot simply run theavg_bucket
on adate_historgram
aggregation result, but with a secondaryvalue_count
aggregation with some unique value and it works fine :)You want to exclude documents with timestamp on Saturday and Sunday, so you can exclude those documents in your query using a script
You may not need the first aggregation by month, since you already have this information using day interval
BTW you need to make sure dynamic scripting is enabled by adding this to your
elasticsearch.yml
configurationOr add a groovy script under /config/scripts and use a filtered query with a script in filter
There is a pretty convoluted solution and not really performant, using the following
scripted_metric
aggregation.Let's detail each script below.
What I'm doing in
init_script
is creating a map of the number of business days for each month since 1970 and storing that in the_agg.allbizdays
map.In
map_script
, I'm simply retrieving the number of weekdays for the month of each document;In
combine_script
, I'm summing up the average doc count for each shardAnd finally in
reduce_script
, I'm summing up the average doc count for each node:Again I think it's pretty convoluted and as Andrei rightly said it, it is probably better to wait for 2.0 to make it work the way it should, but in the meantime you have this solution, if you need it.