How to handle multiple updates / deletes with Elas

2019-05-20 04:36发布

问题:

I need to update or delete several documents.

When I update I do this:

  1. I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000).
  2. For each of the returned documents, I modify certain values.
  3. I resent to elasticsearch the whole modified list (bulk index).

This operation takes place until point 1 no longer returns results.

When I delete I do this:

  1. I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000)
  2. I delete every found document sending to elasticsearch _id document (10000 requests)

This operation repeats until point 1 no longer returns results.

Is this the right way to make an update?

When I delete, is there a way I can send several ids to delete multiple documents at once?

回答1:

For deletion and update, if you want to delete or update by id you can use the bulk api:

Bulk API

The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.

The possible actions are index, create, delete and update. index and create expect a source on the next line, and have the same semantics as the op_type parameter to the standard index API (i.e. create will fail if a document with the same index and type exists already, whereas index will add or replace a document as necessary). delete does not expect a source on the following line, and has the same semantics as the standard delete API. update expects that the partial doc, upsert and script and its options are specified on the next line.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

You can also delete by query instead:

Delete By Query API

The delete by query API allows to delete documents from one or more indices and one or more types based on a query. The query can either be provided using a simple query string as a parameter, or using the Query DSL defined within the request body.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html



回答2:

For your massive index/update operation, if you don't use it already (not sure), you can take a look at the bulk api documentation. it is tailored for this kind of job.

If you want to retrieve lots of documents by small batches, you should use the scan-scroll search instead of using from/size. Related information can be found here.

To sum up :

  • scroll api is used to load results in memory and to be able to iterate over it efficiently
  • scan search type disable sorting, which is costly

Give it a try, depending on the data volume, it could improve the performance of your batch operations.

For the delete operation, you can use this same _bulk api to send multiple delete operation at once.

The format of each line is the following :

{ "delete" : { "_index" : "indexName", "_type" : "typeName", "_id" : "1" } }
{ "delete" : { "_index" : "indexName", "_type" : "typeName", "_id" : "2" } }