Elasticsearch aggregation turns results to lowerca

2019-03-05 05:15发布

问题:

I've been playing with ElasticSearch a little and found an issue when doing aggregations.

I have two endpoints, /A and /B. In the first one I have parents for the second one. So, one or many objects in B must belong to one object in A. Therefore, objects in B have an attribute "parentId" with parent index generated by ElasticSearch.

I want to filter parents in A by children attributes of B. In order to do it, I first filter children in B by attributes and get its unique parent ids that I'll later use to get parents.

I send this request:

POST http://localhost:9200/test/B/_search
{
    "query": {
        "query_string": {
            "default_field": "name",
            "query": "derp2*"
        }
    },
    "aggregations": {
        "ids": {
            "terms": {
                "field": "parentId"
            }
        }
    }
}

And get this response:

{
  "took": 91,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjH5u40Hx1Kh6rfQG",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child2"
        }
      },
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjD_U40Hx1Kh6rfQF",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child1"
        }
      },
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjKqf40Hx1Kh6rfQH",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child3"
        }
      }
    ]
  },
  "aggregations": {
    "ids": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "au_ffvwm40hx1kh6rfqa",
          "doc_count": 3
        }
      ]
    }
  }
}

For some reason, the filtered key is returned in lowercase, hence not being able to request parent to ElasticSearch

GET http://localhost:9200/test/A/au_ffvwm40hx1kh6rfqa

Response:
{
  "_index": "test",
  "_type": "A",
  "_id": "au_ffvwm40hx1kh6rfqa",
  "found": false
}

Any ideas on why is this happening?

回答1:

The difference between the hits and the results of the aggregations is that the aggregations work on the created terms. They will also return the terms. The hits return the original source.

How are these terms created? Based on the chosen analyser, which in your case is the default one, the standard analyser. One of the things this analyser does is lowercasing all the characters of the terms. Like mentioned by Andrei, you should configure the field parentId to be not_analyzed.

PUT test
{
  "mappings": {
    "B": {
      "properties": {
        "parentId": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }   
}