I have a set of 2.8 million docs with sets of tags that I'm querying with ElasticSearch, but many of these docs can be grouped together by one ID. I want to query my data using the tags, and then aggregate them by the ID that repeats. Often my search results have tens of thousands of documents, but I only want to aggregate the top 100 results of the search. How can I constrain an aggregation to only the top 100 results from a query?
相关问题
- What is the best way to do a search in a large fil
- not found value index error on elastic4s
- Search Multiple Arrays for
- Find index given multiple values of array with Num
- Google Custom Search Engine not giving the expecte
相关文章
- es 单字段多分词器时,textField.keyword无法高亮
- What is the complexity of bisect algorithm?
- ElasticSearch: How to search for a value in any fi
- What are the disadvantages of ElasticSearch Doc Va
- Visual Studio: Is there an incremental search for
- NoNodeAvailableException[None of the configured no
- Types cannot be provided in put mapping requests,
- Elasticsearch cluster 'master_not_discovered_e
Sampler Aggregation :
This query will limit the sub aggregation to top 100 docs from the result and then bucket them by ID.
Optionally, you can use the
field or script and max_docs_per_value
settings to control the maximum number of documents collected on any one shard which share a common value.You can use the
min_doc_count
parameterThe size parameter can be set to define how many term buckets should be returned out of the overall terms list.
By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned).
If set to 0, the size will be set to
Integer.MAX_VALUE
.Here is an example code to return top 100:
You can refer to this for more information.