Querying Elasticsearch Address Based Index

2019-06-06 00:57发布

问题:

I'm Having a really hard time trying to get an address based index to return results in the same was as an autocomplete works, I have been trying two different methods, I started out trying to use nGram's and custom analyzers but i have really struggled to get relevant results to show how one would expect when using an address autocomplete.

The second method i have focused on is to see if the completion suggester elasticsearch ships with would be any easier to get working but i seem to be hitting a road block in every direction.

We send regular client-side API calls based on the input value on every key-up.

the issue i seem to face is either.. I'm not returning relevant enough results and if / when they are relevant an additional character partial word can force no results to be returned at all.

An example would be for the following address: 7 West Hill Gardens, West Hill EX9 6BL

My documents are stored like so:

Completion Suggester

"id": "1",
"address": "7, Westhill Gardens, Bromyard HR74HW",
"suggest": "7, Westhill Gardens, Bromyard HR74HW"

Completions Suggester Mappings:

{
  "mappings": {
    "addresses": {
      "properties": {
        "suggest": {
          "type": "completion",
          "preserve_separators": false,
          "analyzer": "standard",
          "search_analyzer": "standard"
        },
        "address": {
          "type": "text"
        },
        "id": {
          "type": "keyword"
        }
      }
    }
  }
}

Note i set the preserve_separators to false in the suggester to allow for west hill to also be matched as westhill, This works fine on the suggester however in my nGram index im unsure how i enable to same functionality with mappings and i believe that may be part of the issue i have with not returning relevant results.

With the suggester is when i query for 7 westhill gardens using the following query:

{
  "suggest": {
    "suggestions": {
     "prefix": "7 westhill gardens",
      "completion": {
        "field": "suggest",
        "fuzzy": {
          "fuzziness": 2 // Also tried with no fuzzy and fuzziness: 1
        }
      }
    }
  }
}

The following results are returned:

"address": "7, Westhill Gardens, Brackley NN136AA",
"address": "7, Westhill Gardens, Bromyard HR74HW",
"address": "7, West Hill Gardens, West Hill, Budleigh Salterton EX96BL",

However if i remove the number 7 from the query and perform this query it returns no results, This is kind of a key issue as not all users will start their query with the given house number and it is quite common to perform the search as west hill gardens as appose to 7 west hill gardens

{
  "suggest": {
    "suggestions": {
      "prefix": "westhill gardens",
      "completion": {
        "field": "suggest",
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}

And lastly if i query for just the house number as shown below, No results are returned.

{
  "suggest": {
    "suggestions": {
      "prefix": "7 EX9 6BL",
      "completion": {
        "field": "suggest",
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}

I'm hoping someone with more experience than me can shed some thoughts on what the best approach would be and if i should stick to nGrams and try and get a custom analyzer / filter approach working.. Or am i just doing it totally wrong?! I have only just started to learn elasticsearch so i send my apologies if my terminology is incorrect.

回答1:

Think about Completion Suggester more as a "starts with ..." mechanism. Documentation says: "The completion suggester is a so-called prefix suggester." So with this type of search you'll propably cannot have everything you want.

To get it a bit closer, one solutuion is a combination of preserve_position_increments and stopwords analyzer. First create index with following settings:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_stop_analyzer": {
          "type": "stop"
        }
      }
    }
  }
}

and then mapping for documetn type:

{
  "properties": {
    "suggest": {
      "type": "completion",
      "preserve_separators": false,
      "preserve_position_increments": false
    },
    "address": {
      "type": "text"
    },
    "id": {
      "type": "keyword"
    }
  }
}

Then this query:

{
  "suggest": {
    "suggestions": {
     "prefix": "westhill gardens",
      "completion": {
        "field": "suggest",
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}

would result in both:

"address": "5, West hill Gardens, Bromyard AAA"
"address": "7, Westhill Gardens, Bromyard HR74HW"

But if you try to search for: "prefix": "7 gardens" - it wont't give you results (because of so-called prefix suggester nature of this mechanism).

What could be another option? nGrams, as already said, or you could also experiment with query_string. Simple example, let's say you have a standard mapping:

{
  "properties": {
    "suggest": {
      "type": "text"
    },
    "address": {
      "type": "text"
    },
    "id": {
      "type": "keyword"
    }
  }
}

then using query_string:

{
  "query": {
        "query_string" : {
            "default_field" : "suggest",
            "query" : "west* Gardens*",
            "default_operator": "OR",
            "split_on_whitespace": "true",
            "fuzziness" : 2
    }
  }
}

it gives me in result for example:

"address": "267, Westhill Gardens, Bromyard HR74HW",
"address": "5, West hill Gardens, Bromyard AAA",
"address": "1, West hill Bromyard HR74HW"

But please note that using * wildcard results in worse performance and memory consumption (for sure avoid using * at the beginning of a term) but on the other hand query_string is a very versatile tool.

***Update for NGram case***

As I have written about NGrams before, I'll post here the first idea for it.

Some initial assumptions:

  • enable the autocomplete after entering 3 characters (setting: "min_gram": 3)
  • we need to analyze digits, spaces, comas etc. - if user type "7, W" we need get the set of results
  • for testing enable ngram vector - it allows to see how it really works (setting "term_vector": "yes"), but should be disabled on production

Mapping - for index and type - looks like this:

{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "tokenizer": {
            "ngram_tokenizer": {
               "type": "nGram",
               "min_gram": 3,
               "max_gram": 10
            }
         },
         "analyzer": {
            "ngram_tokenizer_analyzer": {
               "type": "custom",
               "tokenizer": "ngram_tokenizer"
            }
         }
      }
   },
   "mappings": {
      "addresses": {
         "properties": {
            "suggest": {
               "type": "text",
               "term_vector": "yes",
               "analyzer": "ngram_tokenizer_analyzer"
            },
            "address": {
              "type": "text"
            },
            "id": {
              "type": "keyword"
            }
         }
      }
   }
}

Now a document can be indexed. You can check how analyzer works (thanks to "term_vector": "yes") with:

GET http://127.0.0.1:9200/sug/addresses/{documentId}/_termvector?fields=suggest

And after that the query (Bool Query this time) is really simple:

{ 
  "query" : 
  { "bool" : 
    { "must" : [ 
        { "match" : { "suggest": { "query": "1, Westhil" } } }
    ]}
}

}

I think it should meet all the requirements you described - searching with starting part of the address, with house number or any other part and also the issue with spaces. You can decrease min_gram to 2 if this is really necessary. If you need to get into more details feel free to ask or, as you suggested, open a new question.



回答2:

The completion suggester completes only the exact term which is given in the completion field, so a query without "7" returns zero results.

The solution you thought about with nGrams is the way to go.