Updating Solr Index when product data has changed

2019-08-05 18:45发布

问题:

We are working on implementing Solr on e-commerce site. The site is continuously updated with a new data, either by updates made in existing product information or add new product altogether.

We are using it on asp.net mvc3 application with solrnet.

We are facing issue with indexing. We are currently doing commit using following:

private static ISolrOperations<ProductSolr> solrWorker;
         public void ProductIndex()
         {
             //Check connection instance invoked or not
             if (solrWorker == null)
             {
                  Startup.Init<ProductSolr>("http://localhost:8983/solr/");
                  solrWorker = ServiceLocator.Current.GetInstance<ISolrOperations<ProductSolr>>();

             }
             var products = GetProductIdandName();
             solrWorker.Add(products);
             solrWorker.Commit();

         }

Although this is just a simple test application where we have inserted just product name and id into the solr index. Every time it runs, the new products gets updated all at once, and available when we search it. I think this create the new data index into solr everytime it runs? Correct me if I'm wrong.

My Question is:

  1. Does this recreate Solr Index Data in whole? Or just update the data that is changed/new? How? Even if it only updates changed/new data, how it knows what data is changed? With large data set, this must have some issues.
  2. What is the alternative way to track what has changed since last commit, and is there any way to add those product into Solr index that has changed.
  3. What happens when we update existing record into solr? Does it delete old data and insert new and recreate whole index? Is this resource intensive?
  4. How big e-commerce retailer does this with millions of products.

What is the best strategy to solve this problem?

回答1:

  1. When you do an update only that record is delete and inserted. Solr does not update the records. The other records are untouched. When you commit the data new segments would be created with this new data. On optimize the data is optimized into a single segment.

  2. You can use Incremental build technique to add/update records after the last build. DIH provides it out of the box, If you are handling it manually through jobs you can maintain the timestamp and run builds.

  3. Solr does not have an update operation. It will perform a delete and add. So you have to use the complete data again and not just the updated fields. Its not resource intensive. Usually only Commit and Optimize are.

  4. Solr can handle any amount of data. You can use Sharding if your data grows beyond the handling capacity of a single machine.