I have a search index with 4 custom analyzers. Two of them are for language specific searching, and the other 2 are for "exact" searching (no need for lemmatization). For simplicity, I am including only the info for the language specific custom analyzers, although the overall solution will need to be applicable to all the custom analyzers.
{
"tokenizers": [
{
"@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
"name": "text_language_search_custom_analyzer_ms_tokenizer",
"maxTokenLength": 300,
"isSearchTokenizer": false,
"language": "french"
},
{
"@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
"name": "text_language_search_endsWith_custom_analyzer_ms_tokenizer",
"maxTokenLength": 300,
"isSearchTokenizer": false,
"language": "french"
}
],
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "text_language_search_custom_analyzer",
"tokenizer": "text_language_search_custom_analyzer_ms_tokenizer",
"tokenFilters": [
"lowercase",
"lang_text_synonym_token_filter",
"asciifolding"
],
"charFilters": [
"html_strip"
]
},
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "text_language_search_endsWith_custom_analyzer",
"tokenizer": "text_language_search_endsWith_custom_analyzer_ms_tokenizer",
"tokenFilters": [
"lowercase",
"lang_text_endsWith_synonym_token_filter",
"asciifolding",
"reverse"
],
"charFilters": [
"html_strip"
]
}
]
}
For simplicity, lets assume the index has only 2 searchable fields. - CategoryLangSearch (uses text_language_search_custom_analyzer) - CategoryLangSearchEndsWith (uses text_language_search_endsWith_custom_analyzer)
Now assume the index has only 1 document, with the following: - CategoryLangSearch field value of "TELECOMMUNICATIONS" - CategoryLangSearchEndsWith field value of "TELECOMMUNICATIONS"
Our UI/API layer has logic so if the user searches TELE*, it will now to use CategoryLangSearch as the field to search in. Likewise, our UI/API layer will detect if the user searches with an asterisk wildcard in the front. So if the user searches for *TIONS, the UI/API layer is smart enough to instead search against the CategoryLangSearchEndsWith field.
All that is great... it works exactly as intended.
The problem, however, is what can we do if the user searches with * COMMU * (ignore the spaces... S.O. treats the asterisks as signal for bold. The user types in asteriskCOMMUasterisk where asterisk is *)
I thought it would be "smart" if I built the azure search param like this: (CategoryLangSearch:(COMMU*) OR CategoryLangSearchEndsWith:(*UMMOC)) but, in practice, I found that this does not find TELECOMMUNICATIONS ORGANIZATION. This makes perfect sense when I see the query we build.
SO, my question is, how do we pull this off? Can we pull it off in Azure Search in anyway, shape or form? I don't see a path to success for this one. The only possible solution I could see is the following: 1. If user searches for something... 2. first query our MS SQL server directly to search using %something% syntax which is supported in SQL. 3. find the IDs the match, and then use THAT to search against Azure Search index.