I have a problem with a search with special characters in solr. My document has a field "title" and sometimes it can be like "Titanic - 1999" (it has the character "-"). When i try to search in solr with "-" i receive a 400 error. I've tried to escape the character, so I tried something like "-" and "\-". With that changes solr doesn't response me with an error, but it returns 0 results.
How can i search in the solr admin with that special character(something like "-" or "'"???
Regards
UPDATE Here you can see my current solr scheme https://gist.github.com/cpalomaresbazuca/6269375
My search is to the field "Title".
excerpt from the schema.xml:
...
<!-- A general text field that has reasonable, generic
cross-language defaults: it tokenizes with StandardTokenizer,
removes stop words from case-insensitive "stopwords.txt"
(empty by default), and down cases. At query time only, it
also applies synonyms. -->
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
...
<field name="Title" type="text_general" indexed="true" stored="true"/>
I spent a lot of time getting this done. Here is a clear step-by-step things to be done to query special characters in SolR. Hope it helps someone.
Under both, "index" and query" analyzers modify the
WordDelimiterFilterFactory
and addtypes="characters.txt"
Something like:Ensure that you use WhitespaceTokenizerFactory as the tokenizer as shown above.
Your characters.txt file can have entries like-
Clear the data, re-index and query for the entered characters. It will work.
You are using the standard
text_general
field for the title attribute. This might not be a good choice.text_general
is meant to be for huge chunks of text (or at least sentences) and not so much for exact matching of names or titles.The problem here is that
text_general
uses the StandardTokenizerFactory.StandardTokenizerFactory
does the following:This means the '-' character will be completely ignored and be used to tokenize the String.
This does also explain why
select?q=title:\-
won't work here.Choose a better fitting field type:
Instead of the
StandardTokenizerFactory
you could use thesolr.WhitespaceTokenizerFactory
, that only splits on whitespace for exact matching of words. So making your own field type for the title attribute would be a solution.Solr also has a mininal fieldtype called
text_ws
. Depending on your requirements this might be enough.To search for your exact phrase put inverted commas round it:
If you just want to search for that special character then you will need to escape it:
Also check: Special characters (-&+, etc) not working in SOLR Query
If you know exactly which special characters you dont want to use then you can add this to the regex-normalize.xml
This will replace all "-" with %2D, so when you search, as long as you search for %2D instead of the "-" it will work fine