I'm new to Elasticsearch and I read here https://www.elastic.co/guide/en/elasticsearch/plugins/master/mapper-attachments.html that the mapper-attachments plugin is deprecated in elasticsearch 5.0.0.
I now try to index a pdf file with the new ingest-attachment plugin and upload the attachment.
What I've tried so far is
curl -H 'Content-Type: application/pdf' -XPOST localhost:9200/test/1 -d @/cygdrive/c/test/test.pdf
but I get the following error:
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}},"status":400}
I would expect that the pdf file will be indexed and uploaded. What am I doing wrong?
I also tested Elasticsearch 2.3.3 but the mapper-attachments plugin is not valid for this version and I don't want to use any older version of Elasticsearch.
You need to make sure you have created your ingest pipeline with:
Then you can make a PUT not POST to your index using the pipeline you've created.
In your example, should be something like:
Remembering that the PDF content must be base64 encoded.
Hope it will help you.
Edit 1 Please make sure to read these, it helped me a lot:
Elastic Ingest
Ingest Plugin
Ingest Presentation
Edit 2
Also, you must have ingest-attachment plugin installed.
Edit 3
Please, before you create your ingest processor (attachment), create your index, map with the fields you will use and make sure you have the data field in your map (same name of the "field" in your attachment processor), so ingest will process and fullfill your data field with your pdf content.
I inserted the indexed_chars option in the ingest processor, with -1 value, so you can index large pdf files.
Edit 4
The mapping should be something like that:
In this case, I use the brazilian filter, but you can remove that or use your own.