Max scrollable time for elasticsearch

2019-08-28 15:33发布

问题:

What is the max scrollable time that can be set for scrolling search ?

Documentation: https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-scroll

回答1:

If you're asking this kind of question you're probably not using Scroll in ES how it was intended. You want to use scroll when you know for sure that you need to return ALL matching records.

Great use case for Scroll

I want to pull back 1,000,000 records from ES to be written to a CSV file. This is a perfect use case for scroll. You need to return 1M rows, but you don't want to return them all as 1 chunk from the database. Instead you can chunk them into ~1000 record chunks, write the chunk to the CSV file, then get the next chunk. Your scroll keep alive can be set to 1 minute and you'll have no problems.

Bad use case for Scroll

A user is viewing the first 50 records and at some time in the future, they may or may not want to view the next 50 records.

For a use case like this you want to use the Search After API



回答2:

There is no one-value-fits-all value of max scroll time.

Scan & Scroll is meant to scan through a large number of records in chunks. The max value for each chunk has to be obtained by incremental increases till you hit the breaking as it depends on your cluster resources,network latency and cluster load.

We had a 3 node test setup with about 1 billion records and 1 TB of data. I was able to scroll through the entire index with scroll size 5000 and timeout 5m. However, there were lots of timeouts with those values. From our analysis,we observered that scroll timeouts were heavily dependent on cluster load and network latency. So we finally settled on 3500 size and 4m timeout.

So i would recomend the following-

  • Incrementally increase the size and timeout values to get the max value for your network.
  • Once you have the max value, reduce it a notch to accommodate for failures due to cluster load & latency