Is it possible to store images in Elasticsearch clusters? If yes, then is there a resource about the work flow? I checked the following link: https://github.com/kzwang/elasticsearch-image
Since we have to handle large image files (over 500GB), we are planning to use HDFS.
Storing whole images in Elasticsearch will not be very beneficial, because if the image is scaled/cropped and then used as a query, it will give incorrect results. What you need depends on why you want to index these images.
In my case, I need to find if an image after some scaling or cropping, has a close match in my database. I am extracting local descriptors (SIFT/SURF) of images and using them to build an Elasticsearch index. This will reduce the image index size as instead of storing the whole image, only a few features are stored. I will be storing all these images on S3 for now and Elasticsearch will store ids for these images along with the features extracted from them.
Regarding elasticsearch-image: This plugin has not been updated in a while and the most recent responses to issues were from last year. This plugin integrates LIRE with Elasticsearch, where LIRE provides the functionality of a multiple image fingerprints extractor.
Possible solutions:
Integrate the library OpenCv (to compute feature vectors for an image) and Elasticsearch and build your own index using these image features instead of storing a whole image. For the product architecture, you can get some hints here.
Use an older version of Elasticsearch with a compatible version of elasticsearch-image.
Upgrade elasticsearch-image to work with the latest version of Elasticsearch.
You can also use SOLR along with LireSolr plugin to integrate with the LireSolr library.