I have a large document store in elasticsearch and would like to retrieve the distinct filter values for display on HTML drop-downs.
An example would be something like
[
{
"name": "John Doe",
"deparments": [
{
"name": "Accounts"
},
{
"name": "Management"
}
]
},
{
"name": "Jane Smith",
"deparments": [
{
"name": "IT"
},
{
"name": "Management"
}
]
}
]
The drop-down should have a list of departments, i.e. IT, Account and Management.
Would some kind person please point me in the right direction for retrieving a distinct list of departments from elasticsearch?
Thanks
This is a job for a terms
aggregation (documentation).
You can have the distinct departments
values like this :
POST company/employee/_search
{
"size":0,
"aggs": {
"by_departments": {
"terms": {
"field": "departments.name",
"size": 0 //see note 1
}
}
}
}
Which, in your example, outputs :
{
...
"aggregations": {
"by_departments": {
"buckets": [
{
"key": "management", //see note 2
"doc_count": 2
},
{
"key": "accounts",
"doc_count": 1
},
{
"key": "it",
"doc_count": 1
}
]
}
}
}
Two additional notes :
- setting
size
to 0 will set the maximum buckets number to Integer.MAX_VALUE. Don't use it if there are too many departments
distinct values.
- you can see that the keys are
terms
resulting of analyzing departments
values. Be sure to use your terms
aggregation on a field mapped as not_analyzed
.
For example, with our default mapping (departments.name
is an analyzed
string), adding this employee:
{
"name": "Bill Gates",
"departments": [
{
"name": "IT"
},
{
"name": "Human Resource"
}
]
}
will cause this kind of result:
{
...
"aggregations": {
"by_departments": {
"buckets": [
{
"key": "it",
"doc_count": 2
},
{
"key": "management",
"doc_count": 2
},
{
"key": "accounts",
"doc_count": 1
},
{
"key": "human",
"doc_count": 1
},
{
"key": "resource",
"doc_count": 1
}
]
}
}
}
With a correct mapping :
POST company
{
"mappings": {
"employee": {
"properties": {
"name": {
"type": "string"
},
"departments": {
"type": "object",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
The same request ends up outputting :
{
...
"aggregations": {
"by_departments": {
"buckets": [
{
"key": "IT",
"doc_count": 2
},
{
"key": "Management",
"doc_count": 2
},
{
"key": "Accounts",
"doc_count": 1
},
{
"key": "Human Resource",
"doc_count": 1
}
]
}
}
}
Hope this helps!