Find distinct values, not distinct counts in elast

2019-01-14 11:25发布

问题:

Elasticsearch documentation suggests* that their piece of code

*documentation fixed

GET /cars/transactions/_search?search_type=count
{
  "aggs": {
    "distinct_colors": {
      "cardinality": {
        "field": "color"
      }
    }
  }
}

corresponds to sql query

SELECT DISTINCT(color) FROM cars

but it actually corresponds to

SELECT COUNT(DISTINCT(color)) FROM cars

I don't want to know how many distinct values I have but what are the distinct values. Anyone knows how to achieve that?

回答1:

Use a terms aggregation on the color field. And you need to pay attention to how that field you want to get distinct values on is analyzed, meaning you need to make sure you're not tokenizing it while indexing, otherwise every entry in the aggregation will be a different term that is part of the field content.

If you still want tokenization AND to use the terms aggregation you might want to look at not_analyzed type of indexing for that field, and maybe use multi fields.

Terms aggregation for cars:

GET /cars/transactions/_search?search_type=count
{
  "aggs": {
    "distinct_colors": {
      "terms": {
        "field": "color",
        "size": 1000
      }
    }
  }
}


回答2:

To update the excellent answer from Andrei Stefan, we need to say that the query parameter search_type=count is no more supported in Elasticsearch 5. The new way of doing this is to add "size" : 0 in the body such as :

GET /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "distinct_colors": {
      "terms": {
        "field": "color",
        "size": 1000
      }
    }
  }
}


回答3:

Personally, both of the answers were arcane to me and hopelessly complex when I wanted to add multiple filters.

For me, what made sense was to go on the Discover tab and apply the filters I wanted. I then saved my search.

Then, I created a new Bar Chart visualization using my saved search. I then modified the X-Axis to use Terms aggregation based on my field of interest (in my case, Usernames), and then order by Count. Make sure the Size is something large, like 500.

You should be able to get the results in tabular form underneath your chart. Simple, and no complex JSON programming. Just a series of clicks. You can even save the visualization for later.