I have a list of communities. And I need to create aggregation query which will count all data which have the same title.
[
{
"_id": "56161cb3cbdad2e3b437fdc3",
"_type": "Comunity",
"name": "public",
"data": [
{
"title": "sonder",
"creationDate": "2015-08-22T03:43:28 -03:00",
"quantity": 0
},
{
"title": "vule",
"creationDate": "2014-05-17T12:35:01 -03:00",
"quantity": 0
},
{
"title": "omer",
"creationDate": "2015-01-31T04:54:19 -02:00",
"quantity": 3
},
{
"title": "sonder",
"creationDate": "2014-05-22T05:09:36 -03:00",
"quantity": 3
}
]
},
{
"_id": "56161cb3dae30517fc133cd9",
"_type": "Comunity",
"name": "static",
"data": [
{
"title": "vule",
"creationDate": "2014-07-01T06:32:06 -03:00",
"quantity": 5
},
{
"title": "vule",
"creationDate": "2014-01-10T12:40:28 -02:00",
"quantity": 1
},
{
"title": "vule",
"creationDate": "2014-01-09T09:33:11 -02:00",
"quantity": 3
}
]
},
{
"_id": "56161cb32f62b522355ca3c8",
"_type": "Comunity",
"name": "public",
"data": [
{
"title": "vule",
"creationDate": "2014-02-03T09:55:28 -02:00",
"quantity": 2
},
{
"title": "vule",
"creationDate": "2015-01-23T09:14:22 -02:00",
"quantity": 0
}
]
}
]
So desire result should be
[
{
title: vule,
total: 6
},
{
title: omer,
total: 1
},
{
title: sonder,
total: 1
}
]
I wrote some aggregation queries but it still not work. How can I get desire result?
PS: I tried to create nested aggregation
ejs.Request().size(0).agg(
ejs.NestedAggregation('comunities')
.path('data')
.agg(
ejs.FilterAggregation('sonder')
.filter(
ejs.TermsFilter('data.title', 'sonder')
).agg(
ejs.ValueCountAggregation('counts')
.field('data.title')
)
)
);
You need to use terms aggregations.
Now depending on your mapping there could be two ways of doing that:
1. Your data field is stored as an subdocument
You need to run a simple terms aggregation, which in RAW json looks like:
POST /test/test/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"Grouping": {
"terms": {
"field": "data.title",
"size": 0
}
}
}
}
2. Your data field is stored as an nested document
You have to add a nested subaggregation before doing terms aggregation.
POST /test/test/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"Nest": {
"nested": {
"path": "data"
},
"aggs": {
"Grouping": {
"terms": {
"field": "data.title",
"size": 0
}
}
}
}
}
}
Both will output this:
{
"took": 125,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"Nest": {
"doc_count": 9,
"Grouping": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "vule",
"doc_count": 6 -- The Total count you're looking for
},
{
"key": "sonder",
"doc_count": 2
},
{
"key": "omer",
"doc_count": 1
}
]
}
}
}
}
This, unfortunately, is just a raw query, but I imagine that it can be translated into elastic.js quite easily.
On top of that. If you're going to do aggregations, don't forget to set your fields, that you're doing aggregations on, as not_analyzed
, because it will start counting individual tokens as in documentation
I, myself, would store these documens as nested ones.
Example:
Mappings:
PUT /test
{
"mappings": {
"test": {
"properties": {
"name": {
"type": "string"
},
"data": {
"type": "nested",
"properties": {
"title": {
"type": "string",
"index": "not_analyzed",
"fields": {
"stemmed": {
"type": "string",
"analyzed": "standard"
}
}
},
"creationDate": {
"type": "date",
"format": "dateOptionalTime"
},
"quantity": {
"type": "integer"
}
}
}
}
}
}
}
Test data:
PUT /test/test/56161cb3cbdad2e3b437fdc3
{
"name": "public",
"data": [
{
"title": "sonder",
"creationDate": "2015-08-22T03:43:28",
"quantity": 0
},
{
"title": "vule",
"creationDate": "2014-05-17T12:35:01",
"quantity": 0
},
{
"title": "omer",
"creationDate": "2015-01-31T04:54:19",
"quantity": 3
},
{
"title": "sonder",
"creationDate": "2014-05-22T05:09:36",
"quantity": 3
}
]
}
PUT /test/test/56161cb3dae30517fc133cd9
{
"name": "static",
"data": [
{
"title": "vule",
"creationDate": "2014-07-01T06:32:06",
"quantity": 5
},
{
"title": "vule",
"creationDate": "2014-01-10T12:40:28",
"quantity": 1
},
{
"title": "vule",
"creationDate": "2014-01-09T09:33:11",
"quantity": 3
}
]
}
PUT /test/test/56161cb32f62b522355ca3c8
{
"name": "public",
"data": [
{
"title": "vule",
"creationDate": "2014-02-03T09:55:28",
"quantity": 2
},
{
"title": "vule",
"creationDate": "2015-01-23T09:14:22",
"quantity": 0
}
]
}
Actual query:
POST /test/test/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"Nest": {
"nested": {
"path": "data"
},
"aggs": {
"Grouping": {
"terms": {
"field": "data.title",
"size": 0
}
}
}
}
}
}
P.S. "size":0
means that I'm letting Elasticsearch output all possible terms and not limiting its output as described in documentation.
The size
parameter can be set to define how many term buckets should
be returned out of the overall terms list. By default, the node
coordinating the search process will request each shard to provide its
own top size
term buckets and once all shards respond, it will reduce
the results to the final list that will then be returned to the
client. This means that if the number of unique terms is greater than
size
, the returned list is slightly off and not accurate (it could be
that the term counts are slightly off and it could even be that a term
that should have been in the top size buckets was not returned). If
set to 0
, the size
will be set to Integer.MAX_VALUE
.