I understand the challenges/limitations around deep paging in Solr and that some new features are being implemented. I am trying to do deep paging of very large result sets (e.g., over 100 million documents) using a separate indexed field (integer) into which I insert a random variable (between 0 and some known MAXINT). When querying large result sets, I do the initial field query with no rows returned and then based on the count, I divide the range 0 to MAXINT in order to get on average PAGE_COUNT results by doing the query again across a sub-range of the random variable and grabbing all the rows in that range. Obviously the actual number of rows will vary but it should follow a predictable distribution.
I want to know - has anyone done this at scale? Should this work? I'll report my findings but wanted a bookmark on stackoverflow for this problem.