Multiple indexers on same storage location in Luce

I want to build a highly scalable application where I intend to use Lucene as my search engine library. While browsing through the docs and faqs, I realize that it only allows one index writer to be open on a storage location by creating some write.lock in index directory. We can open multiple IndexReaders on that index.

I am interested in building an architecture where there are number of indexers running on different machines/servers and multiple searcher answering various types of queries on the indexes created by these indexers. Both searchers and indexers will be running on different computers.

In such scenario it will be preferable to have multiple indexers use same index storage location to index the documents. How to achieve this? Should I go with something like NFS (Networked File System)? Has this issue been taken care of by Solr or some other framework on top of Lucene? One obvious solution which comes to my mind is to create one index per indexer and then asking the searchers to make query across multiple index dirs. But these will lead to large number of different index dirs being created, as many as there are indexer servers, which I guess isn't much desirable. I want (# of index dirs) << (# of indexers) < (# of searchers)

What are the various alternatives do I have in this case?

回答1:

First of all: never use NFS with Lucene, it's simply slow and risky.

If it comes to scalability and high availability I'd suggest you to just let elasticsearch do all the hard work for you, so that you can concentrate on your data. You can of course have multiple threads indexing data.

If you want to know more about the distributed nature of elasticsearch I'd suggest you to have a look at this video.

回答2:

Take a look at ElasticSearch and Solr Cloud.

Comparison of ElasticSearch and Solr.