Solr paging 100 Million Document result set

2019-07-21 05:54发布

问题:

I understand the challenges/limitations around deep paging in Solr and that some new features are being implemented. I am trying to do deep paging of very large result sets (e.g., over 100 million documents) using a separate indexed field (integer) into which I insert a random variable (between 0 and some known MAXINT). When querying large result sets, I do the initial field query with no rows returned and then based on the count, I divide the range 0 to MAXINT in order to get on average PAGE_COUNT results by doing the query again across a sub-range of the random variable and grabbing all the rows in that range. Obviously the actual number of rows will vary but it should follow a predictable distribution.

I want to know - has anyone done this at scale? Should this work? I'll report my findings but wanted a bookmark on stackoverflow for this problem.

Check this guide here. Cursors must be efficient enough if u dont want to overload Solr

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results