Possible to store images in Elasticsearch?

2019-03-21 16:34发布

问题:

Is it possible to store images in Elasticsearch clusters? If yes, then is there a resource about the work flow? I checked the following link: https://github.com/kzwang/elasticsearch-image

Since we have to handle large image files (over 500GB), we are planning to use HDFS.

回答1:

Storing whole images in Elasticsearch will not be very beneficial, because if the image is scaled/cropped and then used as a query, it will give incorrect results. What you need depends on why you want to index these images.

In my case, I need to find if an image after some scaling or cropping, has a close match in my database. I am extracting local descriptors (SIFT/SURF) of images and using them to build an Elasticsearch index. This will reduce the image index size as instead of storing the whole image, only a few features are stored. I will be storing all these images on S3 for now and Elasticsearch will store ids for these images along with the features extracted from them.

Regarding elasticsearch-image: This plugin has not been updated in a while and the most recent responses to issues were from last year. This plugin integrates LIRE with Elasticsearch, where LIRE provides the functionality of a multiple image fingerprints extractor.

Possible solutions:

  1. Integrate the library OpenCv (to compute feature vectors for an image) and Elasticsearch and build your own index using these image features instead of storing a whole image. For the product architecture, you can get some hints here.

  2. Use an older version of Elasticsearch with a compatible version of elasticsearch-image.

  3. Upgrade elasticsearch-image to work with the latest version of Elasticsearch.

  4. You can also use SOLR along with LireSolr plugin to integrate with the LireSolr library.