-->

azure search work around $skip limit

2020-04-16 02:24发布

问题:

I'm doing a job to check if all records from my database exists on Azure Search (around 610k). However there's a 100000 limit with the $skip parameter. Is there a way to work around this limit?

回答1:

You can not facet over more thank 100K documents, however, you can add facets to work around this. For example, let’s say you have a facet called Country and no one facet has more than 100K documents. You can facet over all documents where Country == ‘Canada’, then facet over all documents where Country == ‘USA’, etc…



回答2:

I use data metadata_storage_last_modified as filter, and the following is my example.

    offset           skip              time
     0         --%-->  0
     100,000   --%-->  100,000      getLastTime
     101,000   --%-->  0            useLastTime
     200,000   --%-->   99,000      useLastTime
     201,000   --%-->  100,000      useLastTime & getLastTime
     202,000   --%-->  0            useLastTime

Because Skip limit is 100k, so we can calculate skip by

AzureSearchSkipLimit = 100k
AzureSearchTopLimit = 1k
skip = offset % (AzureSearchSkipLimit + AzureSearchTopLimit)

If total search count will large than AzureSearchSkipLimit, then apply

orderby = "metadata_storage_last_modified desc"

When skip reach AzureSearchSkipLimit ,then get metadata_storage_last_modified time from end of data. And put metadata_storage_last_modified as next 100k search filer.

filter = metadata_storage_last_modified lt ${metadata_storage_last_modified}


回答3:

Just to clarify the other answers: you can't bypass the limit directly but you can use a workaround.

Here's what you can do:

1) Add a unique field to the index. The contents can be a modification timestamp (if it's granular enough to make it unique) or for example a running number. Or alternatively you can use some existing unique field for this.

2) Take the first 100000 results from the index ordered by your unique field

3) Check what is the maximum value (if ordering ascending) in the results for your unique field - so the value of the last entry

4) Take the next 100000 results by ordering based on the same unique field and adding a filter which takes results only where the unique field's value is bigger that the previous maximum. This way the same first 100000 values are not returned but we get the next 100000 values.

5) Continue until you have all results

The downside is that you can't use other custom ordering with the results unless you do the ordering after getting the results.