azure search work around $skip limit

I'm doing a job to check if all records from my database exists on Azure Search (around 610k). However there's a 100000 limit with the $skip parameter. Is there a way to work around this limit?

标签： azure azure-cognitive-search

3条回答

beautiful°

2楼-- · 2020-04-16 03:03

Just to clarify the other answers: you can't bypass the limit directly but you can use a workaround.

Here's what you can do:

1) Add a unique field to the index. The contents can be a modification timestamp (if it's granular enough to make it unique) or for example a running number. Or alternatively you can use some existing unique field for this.

2) Take the first 100000 results from the index ordered by your unique field

3) Check what is the maximum value (if ordering ascending) in the results for your unique field - so the value of the last entry

4) Take the next 100000 results by ordering based on the same unique field and adding a filter which takes results only where the unique field's value is bigger that the previous maximum. This way the same first 100000 values are not returned but we get the next 100000 values.

5) Continue until you have all results

The downside is that you can't use other custom ordering with the results unless you do the ordering after getting the results.

0人赞添加讨论(0) 举报

干净又极端

3楼-- · 2020-04-16 03:14

You can not facet over more thank 100K documents, however, you can add facets to work around this. For example, let’s say you have a facet called Country and no one facet has more than 100K documents. You can facet over all documents where Country == ‘Canada’, then facet over all documents where Country == ‘USA’, etc…

0人赞添加讨论(0) 举报

家丑人穷心不美

4楼-- · 2020-04-16 03:17

I use data metadata_storage_last_modified as filter, and the following is my example.

    offset           skip              time
     0         --%-->  0
     100,000   --%-->  100,000      getLastTime
     101,000   --%-->  0            useLastTime
     200,000   --%-->   99,000      useLastTime
     201,000   --%-->  100,000      useLastTime & getLastTime
     202,000   --%-->  0            useLastTime

Because Skip limit is 100k, so we can calculate skip by

AzureSearchSkipLimit = 100k
AzureSearchTopLimit = 1k
skip = offset % (AzureSearchSkipLimit + AzureSearchTopLimit)

If total search count will large than AzureSearchSkipLimit, then apply

orderby = "metadata_storage_last_modified desc"

When skip reach AzureSearchSkipLimit ,then get metadata_storage_last_modified time from end of data. And put metadata_storage_last_modified as next 100k search filer.

filter = metadata_storage_last_modified lt ${metadata_storage_last_modified}

0人赞添加讨论(0) 举报

azure search work around $skip limit

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间