CloudSearch performance with frequent updates of s

2019-08-02 00:24发布

问题:

I have a use case where I need to upload small document batches (typical 1 to 10 documents of 1KB each) to CloudSearch. Every 2 or 3 seconds a new batch is uploaded. The CloudSearch docs for bulk uploads say:

Make sure your batches are as close to the 5 MB limit as possible. Uploading a larger amount of smaller batches slows down the upload and indexing process.

It's ok if there is a 30 seconds delay before the documents show up in search results. Will my implementation work well as my document count is increasing, let's say to 500.000 docs?

回答1:

Indexing time should be well under your 30 second SLA even with 500k docs, regardless of how or whether you batch your submissions.

I say this based on my own testing with an index of 300k docs and 38 index fields on an m1.small instance type, where it takes less than 3 seconds for a document to be searchable. There are a lot of variables that could affect your own situation, such as how many index fields you have, your instance size, etc, but I think my setup reflects the unfavorable conditions (m1.small instance with complex indexing schema) and is still an order of magnitude faster than your SLA. It's anecdotal evidence of course, but you should be fine.