Emulate a SQL LIKE search with ElasticSearch

I'm just beginning with ElasticSearch and trying to implement an autocomplete feature based on it.

I have an autocomplete index with a field city of type string. Here's an example of a document stored into this index:

{  
   "_index":"autocomplete_1435797593949",
   "_type":"listing",
   "_id":"40716",
   "_source":{  
      "city":"Rome",
      "tags":[  
         "listings"
      ]
   }
}

The analyse configuration looks like this:

{  
   "analyzer":{  
      "autocomplete_term":{  
         "tokenizer":"autocomplete_edge",
         "filter":[  
            "lowercase"
         ]
      },
      "autocomplete_search":{  
         "tokenizer":"keyword",
         "filter":[  
            "lowercase"
         ]
      }
   },
   "tokenizer":{  
      "autocomplete_edge":{  
         "type":"nGram",
         "min_gram":1,
         "max_gram":100
      }
   }
}

The mappings:

{  
   "autocomplete_1435795884170":{  
      "mappings":{  
         "listing":{  
            "properties":{  
               "city":{  
                  "type":"string",
                  "analyzer":"autocomplete_term"
               },
            }
         }
      }
   }
}

I'm sending the following Query to ES:

{  
   "query":{  
      "multi_match":{  
         "query":"Rio",
         "analyzer":"autocomplete_search",
         "fields":[  
            "city"
         ]
      }
   }
}

As a result, I get the following:

{  
   "took":2,
   "timed_out":false,
   "_shards":{  
      "total":5,
      "successful":5,
      "failed":0
   },
   "hits":{  
      "total":1,
      "max_score":2.7742395,
      "hits":[  
         {  
            "_index":"autocomplete_1435795884170",
            "_type":"listing",
            "_id":"53581",
            "_score":2.7742395,
            "_source":{  
               "city":"Rio",
               "tags":[  
                  "listings"
               ]
            }
         }
      ]
   }
}

For the most part, it works. It does find the document with a city = "Rio" before the user has to actually type the whole word ("Ri" is enough).

And here lies my problem. I want it to return "Rio de Janeiro", too. To get "Rio de Janeiro", I need to send the following query:

  {  
       "query":{  
          "multi_match":{  
             "query":"Rio d",
             "analyzer":"standard",
             "fields":[  
                "city"
             ]
          }
       }
    }

Notice the "<whitespace>d" there.

Another related problem is that I'd expect at least all cities that start with an "R" to be returned with the following query:

  {  
       "query":{  
          "multi_match":{  
             "query":"R",
             "analyzer":"standard",
             "fields":[  
                "city"
             ]
          }
       }
    }

I'd expect "Rome", etc... (which is a document that exists in the index), however, I only get "Rio", again. I would like it to behave like the SQL LIKE condition, i.e ... LIKE 'CityName%'.

What am I doing wrong?

标签： elasticsearch sql-like

2条回答

▲ chillily

2楼-- · 2019-05-02 16:46

In Elasticsearch, there is Completion Suggester to give suggestions. Completion Suggester

0人赞添加讨论(0) 举报

Lonely孤独者°

3楼-- · 2019-05-02 16:54

I would do it like this:

change the tokenizer to edge_nGram since you said you need LIKE 'CityName%' (meaning a prefix match):

  "tokenizer": {
    "autocomplete_edge": {
      "type": "edge_nGram",
      "min_gram": 1,
      "max_gram": 100
    }
  }

have the field specify your autocomplete_search as a search_analyzer. I think it's a good choice to have a keyword and lowercase:

  "mappings": {
    "listing": {
      "properties": {
        "city": {
          "type": "string",
          "index_analyzer": "autocomplete_term",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }

and the query itself is as simple as:

{
  "query": {
    "multi_match": {
      "query": "R",
      "fields": [
        "city"
      ]
    }
  }
}

The detailed explanation goes like this: split your city names in edge ngrams. For example, for Rio de Janeiro you'll index something like:

           "city": [
              "r",
              "ri",
              "rio",
              "rio ",
              "rio d",
              "rio de",
              "rio de ",
              "rio de j",
              "rio de ja",
              "rio de jan",
              "rio de jane",
              "rio de janei",
              "rio de janeir",
              "rio de janeiro"
           ]

You notice that everything is lowercased. Now, you'd want your query to take any text (lowercase or not) and to match it with what's in the index. So, an R should match that list above.

For this to happen you want the input text to be lowercased and to be kept exactly like the user set it, meaning it shouldn't be analyzed. Why you'd want this? Because you already have split the city names in ngrams and you don't want the same for the input text. If user inputs "RI", Elasticsearch will lowercase it - ri - and match it exactly against what it has in the index.

A probably faster alternative to multi_match is to use a term, but this requires your application/website to lowercase the text. The reason for this is that term doesn't analyze the input text at all.

{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "city": {
            "value": "ri"
          }
        }
      }
    }
  }
}

0人赞添加讨论(0) 举报

Emulate a SQL LIKE search with ElasticSearch

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间