Lucene/Solr - Query Analysis working, but Select h

2019-07-18 13:53发布

问题:

I have an issue with my solr settings. its NOT searching for "canaDa" in select handler as it is for "canada".

here is the schema for fieldtype text_en_splitting (they all are important):

<fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>

  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

.

Here is the solrconfig settings for select handler:

<requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">20</int>
       <str name="df">text</str>

       <str name="defType">edismax</str>
       <str name="qf">court_id^0.1 jurisdiction^1.0 jur_code^0.5 court_name^1.5 court_code^0.5 court_type^1.0</str>
       <str name="mm">80%</str>
       <str name="q.alt">*:*</str>
       <str name="fl">*</str>
     </lst>

.

Here is the Query Analysis tool of solr admin: .

As you can see, the Query Analysis did break it for "canaDa", but the search cant find it...

回答1:

The behavior you are seeing here is correct based on the way that the text_en_splitting fieldType is configured. With this configuration the only way that "canaDa" is going to match is if the indexed term is also "canaDa", b/c that way they will both be split into "cana" and "da". If you want "canaDa" to match "canada" then I would suggest you remove the splitOnCaseChange=1 option in the WordDelimiterFilterFactory as this is what is causing the issue here.

If removing the splitOnCaseChange setting is not an option, can you explain your requirements and expected behavior in more detail in the question so we can help you find a workable solution.