I'm just beginning with ElasticSearch and trying to implement an autocomplete feature based on it.
I have an autocomplete
index with a field city
of type string
. Here's an example of a document stored into this index:
{
"_index":"autocomplete_1435797593949",
"_type":"listing",
"_id":"40716",
"_source":{
"city":"Rome",
"tags":[
"listings"
]
}
}
The analyse configuration looks like this:
{
"analyzer":{
"autocomplete_term":{
"tokenizer":"autocomplete_edge",
"filter":[
"lowercase"
]
},
"autocomplete_search":{
"tokenizer":"keyword",
"filter":[
"lowercase"
]
}
},
"tokenizer":{
"autocomplete_edge":{
"type":"nGram",
"min_gram":1,
"max_gram":100
}
}
}
The mappings:
{
"autocomplete_1435795884170":{
"mappings":{
"listing":{
"properties":{
"city":{
"type":"string",
"analyzer":"autocomplete_term"
},
}
}
}
}
}
I'm sending the following Query to ES:
{
"query":{
"multi_match":{
"query":"Rio",
"analyzer":"autocomplete_search",
"fields":[
"city"
]
}
}
}
As a result, I get the following:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":2.7742395,
"hits":[
{
"_index":"autocomplete_1435795884170",
"_type":"listing",
"_id":"53581",
"_score":2.7742395,
"_source":{
"city":"Rio",
"tags":[
"listings"
]
}
}
]
}
}
For the most part, it works. It does find the document with a city = "Rio"
before the user has to actually type the whole word ("Ri"
is enough).
And here lies my problem. I want it to return "Rio de Janeiro"
, too. To get "Rio de Janeiro"
, I need to send the following query:
{
"query":{
"multi_match":{
"query":"Rio d",
"analyzer":"standard",
"fields":[
"city"
]
}
}
}
Notice the "<whitespace>d"
there.
Another related problem is that I'd expect at least all cities that start with an "R"
to be returned with the following query:
{
"query":{
"multi_match":{
"query":"R",
"analyzer":"standard",
"fields":[
"city"
]
}
}
}
I'd expect "Rome"
, etc... (which is a document that exists in the index), however, I only get "Rio"
, again. I would like it to behave like the SQL LIKE
condition, i.e ... LIKE 'CityName%'
.
What am I doing wrong?
In
Elasticsearch
, there isCompletion Suggester
to give suggestions. Completion SuggesterI would do it like this:
edge_nGram
since you said you needLIKE 'CityName%'
(meaning a prefix match):autocomplete_search
as asearch_analyzer
. I think it's a good choice to have akeyword
andlowercase
:The detailed explanation goes like this: split your city names in edge ngrams. For example, for
Rio de Janeiro
you'll index something like:You notice that everything is lowercased. Now, you'd want your query to take any text (lowercase or not) and to match it with what's in the index. So, an
R
should match that list above.For this to happen you want the input text to be lowercased and to be kept exactly like the user set it, meaning it shouldn't be analyzed. Why you'd want this? Because you already have split the city names in ngrams and you don't want the same for the input text. If user inputs "RI", Elasticsearch will lowercase it -
ri
- and match it exactly against what it has in the index.A probably faster alternative to
multi_match
is to use aterm
, but this requires your application/website to lowercase the text. The reason for this is thatterm
doesn't analyze the input text at all.