Elasticsearch: How to search with different analyz

2019-05-31 02:00发布

I'm using my custom analyzer autocomplete_analyzer with filter edgeNGram. So mapping looks like:

  "acts_as_taggable_on_tags" : {
    "acts_as_taggable_on/tag" : {
      "properties" : {
        "name" : {
          "type" : "string",
          "boost" : 10.0,
          "analyzer" : "autocomplete_analyzer"
        }
      }
    }
  }

When I search using query_string, it works like autocomplete. For example, query "lon" returns ["lon", "long", "london",...].

But sometimes I need exact matching. How can I get just one exactly matching word "lon"? Can I use another analyzers (e.g. simple or standard) when I making a search query?

1条回答
Fickle 薄情
2楼-- · 2019-05-31 02:55

I think you will need to store the data in 2 separate fields. One would contain the tokens necessary for doing autocomplete queries, the other for the full search queries.

If you have only one field with the tokens [lon, lond, londo, london] then if you search against this field you cannot say "please only match the token london as this is the full word/longest token".

You can have the 2 fields done nicely for you with the multi-field. Take a look at the elasticsearch docs on multi-field. The 'official' documentation is pretty good on this section, please check it out!

I would probably do this:

Mapping

"acts_as_taggable_on_tags" : {
  "acts_as_taggable_on/tag" : {
    "properties" : {
      "name" : {
        "type" : "multi_field",           
        "fields" : {
          "name" : {
            "type" : "string",
            "boost" : 10.0
          },
          "autocomplete" : {
            "type" : "string",
            "analyzer" : "autocomplete_analyzer",
            "boost" : 10.0
          }
        }
      }
    }
  }
}

Querying

for autocomplete queries:

"query": {
  "query_string": {
    "query" : "lon",
    "default_field": "name.autocomplete"
  }
}

for normal queries:

"query": {
  "query_string": {
    "query" : "lon",
    "default_field": "name"
  }
}

Note the difference in "default_field".

The other answer given would not work; the different search_analyzer would mean that a search for 'london' would not get tokenized into lon, lond, londo, london. But this would not stop a search for 'lon' from matching documents with a name of 'london' which I think is what you want.

查看更多
登录 后发表回答