NEST Query for exact text matching

2020-07-11 04:52发布

问题:

I am trying to write a NEST query that should return results based on exact string match. I have researched on web and there are suggestions about using Term, Match, MatchPhrase. I have tried all those but my searches are returning results that contains part of search string. For example, In my database i have following rows of email addresses:

ter@gmail.com

ter@hotmail.com

terrance@hotmail.com

Irrespective of whether i use:

client.Search<Emails>(s => s.From(0)
                        .Size(MaximumSearchResultsSize)
                        .Query(q => q.Term( p=> p.OnField(fielname).Value(fieldValue))))

or

  client.Search<Emails>(s => s.From(0).
                              Size(MaximumPaymentSearchResults).
                              Query(q=>q.Match(p=>p.OnField(fieldName).Query(fieldValue))));                                              

My search results are always returning rows containing "partial search" string.

So, if i provide the search string as "ter", I am still getting all the 3 rows. ter@gmail.com

ter@hotmail.com

terrance@hotmail.com

I expect to see no rows returned if the search string is "ter".If the search string is "ter@hotmail.com" then i would like to see only "ter@hotmail.com".

Not sure what am i doing wrong.

回答1:

Based on the information you have provided in the question, it sounds like the field that contains the email address has been indexed with the Standard Analyzer, the default analyzer applied to string fields if no other analyzer has been specified or the field is not marked as not_analyzed.

The implications of the standard analyzer on a given string input can be seen by using the Analyze API of Elasticsearch:

curl -XPOST "http://localhost:9200/_analyze?analyzer=standard&text=ter%40gmail.com

The text input needs to be url encoded, as demonstrated here with the @ symbol. The results of running this query are

{
   "tokens": [
      {
         "token": "ter",
         "start_offset": 0,
         "end_offset": 3,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "gmail.com",
         "start_offset": 4,
         "end_offset": 13,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

We can see that the standard analyzer produces two tokens for the input, ter and gmail.com, and this is what will be stored in the inverted index for the field.

Now, running a Match query will cause the input to the match query to be analyzed, by default using the same analyzer as the one found in the mapping definition for the field on which the match query is being applied.

The resulting tokens from this match query analysis are then combined by default into a boolean or query such that any document that contains any one of the tokens in inverted index for the field will be a match. For the example

text ter@gmail.com, this would mean any documents that have a match for ter or gmail.com for the field would be a hit

// Indexing
input: ter@gmail.com -> standard analyzer -> ter,gmail.com in inverted index

// Querying
input: ter@gmail.com -> match query -> docs with ter or gmail.com are a hit!

Clearly, for an exact match, this is not what we intend at all!

Running a Term query will cause the input to the term query to not be analyzed i.e. it's a query for an exact match to the term input, but running this on a field that has been analyzed at index time could potentially be a problem; since the value for the field has undergone analysis but the input to the term query has not, you are going to get results returned that exactly match the term input as a result of the analysis that happened at index time. For example

// Indexing
input: ter@gmail.com -> standard analyzer -> ter,gmail.com in inverted index

// Querying
input: ter@gmail.com -> term query -> No exact matches for ter@gmail.com

input: ter -> term query -> docs with ter in inverted index are a hit!

This is not what we want either!

What we probably want to do with this field is set it to be not_analyzed in the mapping definition

putMappingDescriptor
    .MapFromAttributes()
    .Properties(p => p
        .String(s => s.Name(n => n.FieldName).Index(FieldIndexOption.NotAnalyzed)
    );

With this in place, we can search for exact matches with a Term filter using a Filtered query

// change dynamic to your type
var docs = client.Search<dynamic>(b => b
    .Query(q => q
        .Filtered(fq => fq
            .Filter(f => f
                .Term("fieldName", "ter@gmail.com")
            )
        )
    )
);

which will produce the following query DSL

{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "fieldName": "ter@gmail.com"
        }
      }
    }
  }
}


回答2:

You can also do a MatchPhrasePrefix query to get an 'Exact' match performed.

.MatchPhrasePrefix(ma => ma.Field(field).Query(query))