I have a Solr index with many entries, and upon query some subset is returned - each entry having some score, (Obvious). Once the results are returned with scores, I want to be able to only "keep" results that are above some score (i.e. results of a certain quality only). Is it possible to do this when the returned subset could be anything?
I ask because it seems like on some queries a score of say 0.008 is resulting in a decent match, whereas other queries a higher score results in a poor match.
Ideally I'm just looking for a method to take the top x
entries as long as they are of at least a certain quality.
Thanks in advance!
I think you should not do this. With the TF-IDF scoring model, there is no way to compute a score above which all results are relevant and vice-versa. And if you manage to do this, it is very likely that this threshold will not be valid anymore after a few updates to your index (because document frequencies will change).
If you still want to do this, I think it is achievable using function queries : there are a if
(in trunk), and a query
functions available in Solr. Just filter your results so that you only keep entries which have a higher score than a given threshold.
You can implement something called normalized score (Scores As Percentages).
For more details, see:
How to normalize Lucene scores?
how do I normalise a solr/lucene score?
Remove results below a certain score threshold in Solr/Lucene?
Would also like to go through ScoresAsPercentages first.
Solr does not normalize scores since it may be easily done at the client side.
you can use the maxScore which is provided in the results, by dividing all scores by
maxScore.
The first record will have the score of one followed by the rest.