Is there a way to iterate over a Solrj response such that the results are fetched incrementally during iteration, rather than returning a giant in-memory ArrayList
?
Or do we have to resort to this:
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
int fetchSize = 1000;
query.setRows(fetchSize);
QueryResponse rsp = server.query(query);
long offset = 0;
long totalResults = rsp.getResults().getNumFound();
while (offset < totalResults)
{
query.setStart((int) offset); // requires an int? wtf?
query.setRows(fetchSize);
for (SolrDocument doc : server.query(query).getResults())
{
log.info((String) doc.getFieldValue("title"));
}
offset += fetchSize;
}
And while I'm on the topic, why does SolrQuery.setStart()
require an integer
, when SolrDocumentList.getStart()/getNumFound()
return long
?
That code looks correct. You could also wrap it in an Iterator so that your client code doesn't have to know anything about the underlying paging.
About
SolrQuery.setStart()
requiring an Integer, it certainly looks odd, I think you're right and it should be a long as well. Try asking on the solr-user or lucene-dev mailing lists.The reason, Caffeine, is that Solr is designed to give you the top X search results. The expectation is that you will have a "reasonable" number to return. If Solr has to look deep into the search results (into the thousands), you're rubbing against the grain for what Solr was designed for. It will work but the query response will get exponentially slower and slower the deeper into the search results you have to go. There is some ongoing work in Solr to make this use-case more efficient but I've seen no progress on it lately.