Is it possible to use Lucene as full fledged data store (like other(mongo,couch) nosql variants).
I know there are some limitations like newly updated documents by one indexer will not be shown in other indexer. So we need to restart the indexer to get the updates.
But i stumble upon solr lately, it seems these problems are avoided by some kind of snapshot replication.
So i thought i could use lucene as a data store since this also uses same kind of documents(JSON based) used by mongo and couch internally to manage documents, and its proven indexing algorithm fetches the records super fast.
But i am curious has anybody tried that before..? if not what are reasons not choosing this approach.
There is also the problem of durability. While a Lucene index should not get corrupted ever, I've seen it happen. And the approach Lucene takes to repairing a broken index is "throw it away and rebuild from the original data". Which makes perfect sense for an indexing tool. But it does require you to have the data stored somewhere else.
I've only worked with Solr, the Lucene derivative (and I would recommend using Solr to just about anyone) so my opinion may be a little biased but it should be possible to use Solr as a datastore yes, however it wouldn't be very useful without something more permanent in the background.
The problem you may encounter is that entering data into Solr does not guarantee you will get it back when you expect it. Baring the use of pretty strict faceting you may encounter problems retrieving your data simply because the indexer has decided to lump your results in a certain way.
I've experimented a little with this approach but the only real benefit I saw was in situations where you want the search index on the client side so that they can search quickly internally a then query the database for extended information.
My suggestion is to use solr for search and then have it return a short sample of the data you may want as well as an index for further querying in a traditional data store.
TL;DR: Yes, but I wouldn't recommend it.
The Guardian uses Solr as their data store. You can see some of their reasons in that slideshow.
In any case, I think their website is very heavily trafficked (certainly more so than anything I work on), so I think I would feel comfortable saying that Solr will probably work for you., since it scales to their requirements.