How to get all results from solr query?

2020-02-26 15:16发布

问题:

I executed some query like "Address:Jack*". It show numFound = 5214 and display 100 documents in results page(I changed default display results from 10 to 100).

How can I get all documents.

回答1:

I remember myself doing &rows=2147483647

2,147,483,647 is integer's maximum value. I recall using a number bigger than that once and having a NumberFormatException because it couldn't be parsed into an int. I don't know if they use Long nowadays, but 2 billion rows is normally more than enough.

Small note:
Be careful if you are planning to do this in production. If you do a query like * : * and your index is big, you could transferring a couple of gigabytes in that query.
If you know you won't have many docs, go ahead and use integer's max value.

On the other hand, if you are doing a one-time script and just need to dump all results (for example document ID's) then this approach is valid, if you don't mind waiting 3-5 minutes for a query to return.



回答2:

Returning all the results is never a good option as It would be very slow in performance.
Can you mention your use case ?

Also, Solr rows parameter helps you to tune the number of the results to be returned.
However, I don't think there is a way to tune rows to return all results. It doesn't take a -1 as value.
So you would need to set a high value for all the results to be returned.



回答3:

I suggest to use Deep Paging.

Simple Pagination is a easy thing when you have few documents to read and all you have to do is play with start and rows parameters. But this is not a feasible way when you have many documents, I mean hundreds of thousands or even millions.
This is the kind of thing that could bring your Solr server to their knees.

For typical applications displaying search results to a human user, this tends to not be much of an issue since most users don’t care about drilling down past the first handful of pages of search results — but for automated systems that want to crunch data about all of the documents matching a query, it can be seriously prohibitive.

This means that if you have a website and are paging search results, a real user do not go so further but consider on the other hand what can happen if a spider or a scraper try to read all the website pages.

Now we are talking of Deep Paging.

I’ll suggest to read this amazing post:

https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

And take a look at this document page:

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

And here is an example that try to explain how to paginate using the cursors.

SolrQuery solrQuery = new SolrQuery();
solrQuery.setRows(500);
solrQuery.setQuery("*:*");
solrQuery.addSort("id", ORDER.asc);  // Pay attention to this line
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (!done) {
    solrQuery.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
    QueryResponse rsp = solrClient.query(solrQuery);
    String nextCursorMark = rsp.getNextCursorMark();
    for (SolrDocument d : rsp.getResults()) {
            ... 
    }
    if (cursorMark.equals(nextCursorMark)) {
        done = true;
    }
    cursorMark = nextCursorMark;
}


回答4:

What you should do is to first create a SolrQuery shown below and set the number of documents you want to fetch in a batch.

int lastResult=0; //this is for processing the future batch

String query = "id:[ lastResult TO *]"; // just considering id for the sake of simplicity

SolrQuery solrQuery = new SolrQuery(query).setRows(500); //setRows will set the required batch, you can change this to whatever size you want.

SolrDocumentList results = solrClient.query(solrQuery).getResults(); //execute this statement

Here I am considering an example of search by id, you can replace it with any of your parameter to search upon.

The "lastResult" is the variable you can change after execution of the first 500 records(500 is the batch size) and set it to the last id got from the results.

This will help you execute the next batch starting with last result from previous batch.

Hope this helps. Shoot up a comment below if you need any clarification.



回答5:

For selecting all documents in dismax/edismax via Solarium php client, the normal query syntax : does not work. To select all documents set the default query value in solarium query to empty string. This is required as the default query in Solarium is :. Also set the alternative query to :. Dismax/eDismax normal query syntax does not support :, but the alternative query syntax does.

For more details following book can be referred

http://www.packtpub.com/apache-solr-php-integration/book



回答6:

As the other answers pointed out, you can configure the rows to be max integer to yield back all the results for a query. I would recommend though to use Solr feature of pagination, and build a function that will return for you all the results using the cursorMark API. The gist of it is you set the cursorMark parameter to '*', you set the page size(rows parameter), and on each result you'll get a cursorMark for the next page, so you execute the same query only with the cursorMark given from the last result. This way you'll have more flexibility on how much of the results you want back, in a much more performant way.



回答7:

The way I dealt with the problem is by running the query twice:

// Start with your (usually small) default page size
solrQuery.setRows(50); 
QueryResponse response = solrResponse(query);
if (response.getResults().getNumFound() > 50) {
    solrQuery.setRows(response.getResults().getNumFound()); 
    response = solrResponse(query);
}

It makes a call twice to Solr, but gets you all matching records....with the small performance penalty.



回答8:

query.setRows(Integer.MAX_VALUE); works for me!!



标签: solr