Solrj Query - Get the most relevant record first

2019-02-15 20:13发布

问题:

I have some documents in Solr 4.0. I want the most relevant records to be displayed first and then the less relevant ones.

For eg, I have 3 documents with titles as follows -

  1. Towards Income Distribution Policy
  2. Income distribution and economic policies
  3. Income Distribution Policy in Developing Countries

Now when I query something like q=title:Income Distribution Policy,

I would like document number 3 to show up first (as the first 3 words are an exact match) then I want the document number 1 to show up second (as except for "Towards" the remaining match) then I want the document number 2 to show up (as there are some words in between).

My schema.xml looks like this -

<types>
  <fieldType name="search" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="German2" />
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="German2" />
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>
</types>

<fields>
   <field name="title" type="search" indexed="true" stored="true"/>
</fields>

EDIT 1 Debug output

"rawquerystring": "title:Income Distribution Policy",
"querystring": "title:Income Distribution Policy",
"parsedquery": "title:incom title:distribut title:polici",
"parsedquery_toString": "title:incom title:distribut title:polici"

EDIT 2 Modified the fieldType

I used the following combination, still the output is the same.

  1. StandardTokenizerFactory - autoGeneratePhraseQueries(not present) - PorterStemFilterFactory.
  2. StandardTokenizerFactory - autoGeneratePhraseQueries="true" - PorterStemFilterFactory.
  3. StandardTokenizerFactory - autoGeneratePhraseQueries(not present).
  4. StandardTokenizerFactory - autoGeneratePhraseQueries="true".
  5. WhitespaceTokenizerFactory - autoGeneratePhraseQueries(not present) - PorterStemFilterFactory.
  6. WhitespaceTokenizerFactory - autoGeneratePhraseQueries="true" - PorterStemFilterFactory.
  7. WhitespaceTokenizerFactory - autoGeneratePhraseQueries(not present).
  8. WhitespaceTokenizerFactory - autoGeneratePhraseQueries="true".

回答1:

If you don't sort by anything else, you are sorting by Similarity/Relevance. So, if you are not getting the results in the right order, you may need to play with how you are assigning weights and which query parsers you are using.

I assume you are using eDismax with the boost on the title field. In addition have a look at mm (minimum match) and pf (phrase fields) for boosting.

You may also want to test with autoGeneratePhraseQueries field set on your fieldType.

And, of course, debugQuery=true on the queries will help you to see what is going on. You may find that also adding debug.explain.structured=true could useful the first couple of times you are trying to read the debug output.



回答2:

I tried with "" around the query string and it worked.

Like - q=title:"Income Distribution Policy" OR title:Income Distribution Policy.

This gave me the output as document 1 then document 3 and then document 2. Not perfect but close enough.