I am trying to write a NEST query that should return results based on exact string match. I have researched on web and there are suggestions about using Term, Match, MatchPhrase. I have tried all those but my searches are returning results that contains part of search string. For example, In my database i have following rows of email addresses:
ter@gmail.com
ter@hotmail.com
terrance@hotmail.com
Irrespective of whether i use:
client.Search<Emails>(s => s.From(0)
.Size(MaximumSearchResultsSize)
.Query(q => q.Term( p=> p.OnField(fielname).Value(fieldValue))))
or
client.Search<Emails>(s => s.From(0).
Size(MaximumPaymentSearchResults).
Query(q=>q.Match(p=>p.OnField(fieldName).Query(fieldValue))));
My search results are always returning rows containing "partial search" string.
So, if i provide the search string as "ter", I am still getting all the 3 rows. ter@gmail.com
ter@hotmail.com
terrance@hotmail.com
I expect to see no rows returned if the search string is "ter".If the search string is "ter@hotmail.com" then i would like to see only "ter@hotmail.com".
Not sure what am i doing wrong.
Based on the information you have provided in the question, it sounds like the field that contains the email address has been indexed with the Standard Analyzer, the default analyzer applied to string fields if no other analyzer has been specified or the field is not marked as
not_analyzed
.The implications of the standard analyzer on a given string input can be seen by using the Analyze API of Elasticsearch:
The text input needs to be url encoded, as demonstrated here with the @ symbol. The results of running this query are
We can see that the standard analyzer produces two tokens for the input,
ter
andgmail.com
, and this is what will be stored in the inverted index for the field.Now, running a Match query will cause the input to the match query to be analyzed, by default using the same analyzer as the one found in the mapping definition for the field on which the match query is being applied.
The resulting tokens from this match query analysis are then combined by default into a boolean or query such that any document that contains any one of the tokens in inverted index for the field will be a match. For the example
text
ter@gmail.com
, this would mean any documents that have a match forter
orgmail.com
for the field would be a hitClearly, for an exact match, this is not what we intend at all!
Running a Term query will cause the input to the term query to not be analyzed i.e. it's a query for an exact match to the term input, but running this on a field that has been analyzed at index time could potentially be a problem; since the value for the field has undergone analysis but the input to the term query has not, you are going to get results returned that exactly match the term input as a result of the analysis that happened at index time. For example
This is not what we want either!
What we probably want to do with this field is set it to be
not_analyzed
in the mapping definitionWith this in place, we can search for exact matches with a Term filter using a Filtered query
which will produce the following query DSL
You can also do a MatchPhrasePrefix query to get an 'Exact' match performed.