Why elastic search bulk insert uses \\n delimiter,

2019-08-25 18:15发布

问题:

Here is a sample of bulk insertion provided by elastic search docs at: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

They mentioned that "Because this format uses literal \n's as delimiters, please be sure that the JSON actions and sources are not pretty printed".

I would like to know the reason behind such input format and why did they not choose an array of JSON objects instead.

For example something:

POST _bulk
    [{{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
    { "field1" : "value1" }},
    { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
    { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
    { "field1" : "value3" }
    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
    { "doc" : {"field2" : "value2"} }]

The above structure is not correct but something like that Is it something common that I am missing, in a REST API development standards? Delimiters instead of an array?

回答1:

That allows the Bulk endpoint to process the body one/two line after another. If it was a JSON array, ES would have to load and parse the whole JSON body into memory in order to extract one array element after another.

Knowing that the bulk body can be pretty large (i.e. hundreds of MB), this was an optimisation to prevent your ES server from crashing when sending huge bulk requests.