I have an index with many fields, and one field "ServiceCategories" has data similar to this:
|Case Management|Developmental Disabilities
I need to break up the data by the separator "|" and I have attempted to do so with this:
var descriptor = new CreateIndexDescriptor(_DataSource.ToLower())
.Mappings(ms => ms
.Map<ProviderContent>(m => m
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => n.OrganizationName)
.Fields(f => f
.String(ss => ss.Name("raw").NotAnalyzed())))
.String(s => s
.Name(n => n.ServiceCategories)
.Analyzer("tab_delim_analyzer"))
.GeoPoint(g => g.Name(n => n.Location).LatLon(true)))))
.Settings(st => st
.Analysis(an => an
.Analyzers(anz => anz
.Custom("tab_delim_analyzer", td => td
.Filters("lowercase")
.Tokenizer("tab_delim_tokenizer")))
.Tokenizers(t => t
.Pattern("tab_delim_tokenizer", tdt => tdt
.Pattern("|")))));
_elasticClientWrapper.CreateIndex(descriptor);
My search code for ServiceCategories (serviceCategories to ES) uses a simple TermQuery with the value set to lower case.
It's not getting results using this search parameter (the others work fine). Expected results are to get exact matches on at least one term from the above.
I have attempted to get it working by using a classic tokenizer as well:
var descriptor = new CreateIndexDescriptor(_DataSource.ToLower())
.Mappings(ms => ms
.Map<ProviderContent>(m => m
.AutoMap()
.Properties(p => p
.String(s => s
.Name(n => n.OrganizationName)
.Fields(f => f
.String(ss => ss.Name("raw").NotAnalyzed())))
.String(s => s
.Name(n => n.ServiceCategories)
.Analyzer("classic_tokenizer")
.SearchAnalyzer("standard"))
.GeoPoint(g => g.Name(n => n.Location).LatLon(true)))))
.Settings(s => s
.Analysis(an => an
.Analyzers(a => a.Custom("classic_tokenizer", ca => ca
.Tokenizer("classic")))));
This isn't working either. Can anyone help me identify where I am going wrong?
Here's the search request:
### ES REQEUST ###
{
"from": 0,
"size": 10,
"sort": [
{
"organizationName": {
"order": "asc"
}
}
],
"query": {
"bool": {
"must": [
{
"match_all": {}
},
{
"term": {
"serviceCategories": {
"value": "developmental disabilities"
}
}
}
]
}
}
}
Your pattern for
tab_delim_tokenizer
is close, but not quite correct :) The easiest way to see this is to use the Analyze API to understand how an Analyzer will tokenize a piece of text. With your first mapping in place, we can check what the custom analyzer doeswhich returns (snipped for brevity)
demonstrating that the
tab_delim_tokenizer
is not tokenizing how we expect. A small change fixes this by escaping the|
in the pattern with\
and making the pattern a verbatim string literal by prefixing with@
.Here's a complete example
the search results return