I would like to know if there is a way to configure Apache Solr to index files stored on S3 and also to keep the index files on S3 as well. I would like to have a load balanced (LD) scheme where multiple processors can share the master index. This would allow me to bring up additional EC2 instances with Solr and point them to a common repository and index.
I am using PHP and would greatly appreciate any ideas or suggestions.
Thanks.
As for indexing files stored in AWS S3 see here. However, storing the Solr index in S3/EBS is something I did not try yet.. mounting a shared EBS volume might work.
There's a new project called lucene-s3directory
. It enables Lucene to read and write indices to/from AWS S3 directly and does not need a local filesystem. I'm pretty sure it can easily be adapted for Solr. It's pretty early stage so use with caution.
S3Directory dir = new S3Directory("my-lucene-index");
dir.create();
// use it in your code in place of FSDirectory, for example
dir.close();
dir.delete();