Best Practice of Field Collapsing in SOLR 1.4

2019-04-29 07:49发布

问题:

I need a way to collapse duplicate (defined in terms of a string field with an id) results in solr. I know that such a feature is comming in the next version (1.5), but I can't wait for that. What would be the best way to remove duplicates using the current stable version 1.4?

Given that finding duplicates in my case is really easy (comparison of a string field), should it be a Filter, should I overwrite the existing SearchComponent or write a new Component, or use some external libraries like carrot2?

The overall result count should reflect the shortened result.

回答1:

Well, there is a solution: just apply the collapse field patch (see http://issues.apache.org/jira/browse/SOLR-236 for the latest news about this feature, i also recommend you http://blog.jteam.nl/author/martijn).

Doing this you will get working the CollapseComponent . Notice that there is a searching performance degradation associated with this feature.



标签: solr