Elastic Search 1.6
I want to index text that contains hyphens, for example U-12, U-17, WU-12, t-shirt... and to be able to use a "Simple Query String" query to search on them.
Data sample (simplified):
{"title":"U-12 Soccer",
"comment": "the t-shirts are dirty"}
As there are quite a lot of questions already about hyphens, I tried the following solution already:
Use a Char filter: ElasticSearch - Searching with hyphens in name.
So I went for this mapping:
{
"settings":{
"analysis":{
"char_filter":{
"myHyphenRemoval":{
"type":"mapping",
"mappings":[
"-=>"
]
}
},
"analyzer":{
"default":{
"type":"custom",
"char_filter": [ "myHyphenRemoval" ],
"tokenizer":"standard",
"filter":[
"standard",
"lowercase"
]
}
}
}
},
"mappings":{
"test":{
"properties":{
"title":{
"type":"string"
},
"comment":{
"type":"string"
}
}
}
}
}
Searching is done with the following query:
{"_source":true,
"query":{
"simple_query_string":{
"query":"<Text>",
"default_operator":"AND"
}
}
}
What works:
"U-12", "U*", "t*", "ts*"
What didn't work:
"U-*", "u-1*", "t-*", "t-sh*", ...
So it seems the char filter is not executed on search strings? What could I do to make this work?
the Quote from Igor Motov is true, you have to add "analyze_wildcard":true, in order to make it worked with regex. But it is important to notice that the hyphen actually tokenizes "u-12" in "u" "12", two separated words.
if preserve the original is important do not use Mapping char filter. Otherwise is kind of useful.
Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit.
you can do it in the client front.
In your problem u-*
t-sh*
The answer is really simple:
Quote from Igor Motov: Configuring the standard tokenizer
If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore
_
when indexing data.For eg, O-000022334 should indexed as O_000022334.
When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.