Solr safe dataimport and core swap on high-traffic

2019-03-26 04:55发布

问题:

Hello fellow technicians,

Let's assume we have a (PHP) website with millions of visitors a month and we running a SolR index on the website with 4 million documents hosted. Solr is running on 4 separate servers where one server is the master and other 3 servers are replicated.

There can be inserted thousands of documents into Solr every 5 minutes. And besides that, user can update their account which also should trigger a solr update.

I am looking for a safe strategy to rebuild the index fast and safe without missing any document. And to have a safe delta/update strategy. I have thought about a strategy and I want to share it with experts here to hear their opinion about and if I should go for this approach or if they might advise something (totally) different.

Solr DataImport

For all operations I want to use one data-import handler. I want to mix data and delta import into one config file like the DataImportHandlerDeltaQueryViaFullImport. We are using a MySQL database as datasource.

Rebuilding index

For rebuilding the index I have the following in mind; we create a new core called 'reindex' near the 'live' core. With the dataimporthandler we completely rebuild the whole document-set (4 million documents) which takes about 1-2 hours in total. On the live index there are still every minute some updates, inserts and deletions.

After the rebuild, which took about 1-2 hours, the new index is still not really up-to-date anymore. To make the delay smaller we do one 'delta' import against the new core to commit all changes from the last 1-2 hours. When this is done which do a core-swap. The normal 'delta' import handler which runs every minute will pick this new core up.

Commiting updates to live core

To keep our live core in track we run the delta import every minute. Because of the core swap the reindex core (which is now the live core) will be tracked en kept up-to-date. I am guessing it should not really be a problem if this index is delayed for some minutes because dataimport.properties will be swapped as well? The delta-import has overtake these minutes of delay but should be possible.

I hope you understand my situation and my strategy and could advise if i'm doing it the right way in your eyes. Also I would like to know if there are any bottlenecks where I didn't think about? We are running Solr version 1.4.

Some question I do have is, what about replication? If the master server swaps the core how does the salves handle this?

And are there any risks with losing documents when swapping, etc?

Thanks in advance!

回答1:

Good (and hard) question!

The full-import is a very heavy operation, in general it's better to run delta queries to only update your index to the latest changes in your RDMS. I got why you swap the master when you need to do a full-import: you keep up-to-date the live core using delta-import while the full-import is running on the new core, since it takes two hours. Sounds good, as long as the full-import is not used that frequently.

Regarding the replication, I would make sure that there isn't any replication in progress before swapping the master core. For more details about how replication works you can have a look at the Solr wiki if you haven't done it yet.

Furthermore, I would make sure that there isn't any delta-import running on the live core before swapping the master core.



回答2:

We have a slightly modified situation at our end. There are two DataImportHandlers - one for full import, other for delta import. The delta import is triggered by a cron every 3hrs and takes minutes to complete. The full import of about 10m documents take ~48hrs (Insane!). A large part of this involves network latency, since a huge amount of data is fetched from a MySQL Table for every document. These two tables reside on two different MySQL Servers and can not be joined.

We have a 'live' core, which is the one having delta imports. We introduce another 'rebuild' core and perform a full index which takes ~48hrs to finish. By this time, we keep a track of all the documents which have been updated/deleted from 'live' core, and then do a delta import in 'rebuild' core, to get both of them to same state. On a normal day, once both the cores are at the same state, we would swap them and serve from rebuild core. (Who will monitor that the rebuild core is done full indexing and has applied delta patches as well?)

Sometimes, we would want to have both the 'live' and 'rebuild' core serving at the same time for 'ab testing'. In those times, both the 'live' and 'rebuild' core would have delta imports for consistency, and both would be serving. Based on the outcome, we would like to keep one and remove the other by swapping.

In order to make this whole setup operationally stable, we plan to introduce a monitor process which would check if the 'rebuild' core is indexing or done with that. If it has indexed, the monitor process would update it with the delta documents, and activate the delta indexing cron for both the cores. Upon the completion of ab phase, one of the core would be unloaded and the other core swapped. The extra crons would then be disabled.

There are a few more moving parts in this design and the reliability of monitor process is critical to the smooth operation. Any Suggestions/ alternatives?