Different indexing and search strategies on same f

2019-09-04 13:06发布

问题:

For a phrase search, we want to bring up results only if there's an exact match (without ignoring stopwords). If it's a non-phrase search, we are fine displaying results even if the root form of the word matches etc.

We currently pass our data through standardTokenizer, StopFilter, PorterStemFilter and LowerCaseFilter. Due to this when user wants to search for "password management", search brings up results containing "password manager".

If I remove StemFilter, then I will not be able to match for the root form of the word for non-phrase queries. I was thinking if I should index the same data as part of two fields in document.

For the first field (to be used for phrase searches), following tokenizers/filters will be used: StandardTokenizer, LowerCaseFilter

For the second field (Non-phrase searches) StandardTokenizer, StopFilter, PorterStemFilter, LowerCaseFilter

Now, based on whether it's a phrase search or not, I need to rewrite user's query to search in the appropriate field.

Is this the right way to address this issue? Is there any other way to achieve this without doubling index size?

let's say user's query is summary:"Furthermore, we should also fix this"

Internally this will be translated to summary_field1:"Furthermore, we should also fix this"

If user's query is summary:(Furthermore, we should also fix this)

Internally this will be translated to +summary_field2:furthermor +summary_field2:we +summary_field2:should +summary_field2:also +summary_field2:fix

both summary_field1 and summary_field2 index the same data. summary_field1 passes through only StandardTokenizer and LowerCaseFilter, whereas summary_field2 passes through StandardTokenizer, StopFilter, PorterStemFilter and LowerCaseFilter.

Please let me know if I'm missing something here.

回答1:

By defining two different fields you can search for exact matches. By using boosts you can also bring results in one query. For example:

(firstField:"password management")^5 OR (secondField:"pasword management")^1


标签: solr lucene