We are working on implementing Solr on e-commerce site. The site is continuously updated with a new data, either by updates made in existing product information or add new product altogether.
We are using it on asp.net mvc3 application with solrnet.
We are facing issue with indexing. We are currently doing commit using following:
private static ISolrOperations<ProductSolr> solrWorker;
public void ProductIndex()
{
//Check connection instance invoked or not
if (solrWorker == null)
{
Startup.Init<ProductSolr>("http://localhost:8983/solr/");
solrWorker = ServiceLocator.Current.GetInstance<ISolrOperations<ProductSolr>>();
}
var products = GetProductIdandName();
solrWorker.Add(products);
solrWorker.Commit();
}
Although this is just a simple test application where we have inserted just product name and id into the solr index. Every time it runs, the new products gets updated all at once, and available when we search it. I think this create the new data index into solr everytime it runs? Correct me if I'm wrong.
My Question is:
- Does this recreate Solr Index Data in whole? Or just update the data that is changed/new? How? Even if it only updates changed/new data, how it knows what data is changed? With large data set, this must have some issues.
- What is the alternative way to track what has changed since last commit, and is there any way to add those product into Solr index that has changed.
- What happens when we update existing record into solr? Does it delete old data and insert new and recreate whole index? Is this resource intensive?
- How big e-commerce retailer does this with millions of products.
What is the best strategy to solve this problem?