Basically, I need some parts of database data synchronized on up to several dozens of sites. The perfect solution would be creating a central server to host that data.
Each pageload will have to fetch data from both database servers - the local and remote one and writes to the remote server will be quite common too.
While the db server can be as fast as desired hardware-wise, I'm cautious of the bottlenecks:
- Multiple database connections must be established on each pageload.
- Latency of the signal traveling between two physical locations.
Am I right to worry? Would it be wiser to synch the databases with cronjobs or other technologies?
Along with assigning a bounty, I'm adding to the question hoping an expert with real life experience of this comes along:
First a warning, what you're trying to do is not easy; whilst MySQL supports master/ slave replication and you can have multiple masters and slaves running in all sorts of levels of tiers, what you really need to think about is "how do I recover from a database server failure" - do you promote a slave? what about consistency (as its guaranteed that the replication failed between the slaves)? etc. You also need to consider schema modifications; every thing is fine and dandy as long as you have the same schema on all servers, but as soon as you need to push a code update that requires a simultaneous database change you can't rely on that schema change having promulgated to the replications.
Okay, warning over, so how do you do it? Easiest way is to fire up the latest version of PhpMyAdmin which allows you to configure replication very quickly and easily. Before you do that, make sure you have binary-logging turned on in all the MySql servers as this will be your crash recovery savior; http://dev.mysql.com/doc/refman/5.0/en/binary-log.html
Where you site your servers is the next big question. If your users aren't geographically dispersed and your query loads are low-ish, you can probably host them all behind a private network in the same data warehouse. The master-slave replication will give you a large performance uplift in any case since all database reads should be made against slaves and only writes performed against the master.
If you need to geo-locate so they can't all be stored in the same data warehouse, then things get a little more difficult; you now have latency to contend with. In this situation, since the internet is not instantaneous, a write made to the master will take time to propagate to the slave. Therefore any select query made very shortly after the write probably won't return the new data, since it won't have been replicated to the slave yet. This is called "eventual consistency" and is relatively easy to overcome once you recongise its going to happen and code to expect it - i.e. never assume the data is present.
I can't possibly answer your question with any real justice on this site. Your best bet is to read a book, I highly recommend this one;
MySQL High Availability - ISBN-13: 978-0-596-80730-6
MySQL replication is definitely the way to go. The problem with having a single database server is that if the load becomes too high all your sites will go down. You want to spread the load as much as possible because if a server goes down or becomes overloaded, it's the end all big issue.
Some things to keep in mind when dealing with replication
Throw in a load balancing server and your database load woes go away!
any method that suggest offline sync is wasting the benefits of mysql replication
(given the unclear situation you have mentioned)
your solution can be as simple as keep READ/WRITE separately
that's mean on the local database,
problem
benefits
My quick answer to this would be to use a job queue system like Gearman to offload the sync work too. This way, it does not affect page load or user experience. You simply create a Gearman job, and it will send the job to the Gearman Queue and get to it as it can.
This also seems like a much better, instant, solution to using a cron. Because this would instantly add the job to the queue and have it handled almost instantly too. And since you seem to want to replicate only select data, I dont see how MySQL Replication would be of much assistance.
I have worked with Gearman before (even with PHP) and it was a great solution for breaking off work to somewhere else to be completed, when the page load did not need to wait for that job to complete.
Although this might not be simple as I made it seem, since you do need to setup and learn Gearman, but it is a very handy tool.
Hope this helps!
This questions is really down to your situation and I believe you've identified the main two problems with the central DB solution - so yes, you are right to be concerned.
I would personally opt to sync the data to the servers using a cron (or whatever method you chose) - cutting down on hardware costs and page load times. This to me is the more technical solution but in terms of its benefits (faster page loading times, no dependency on the central DB, lower costs) is the correct solution.
Alternatively, you could always set up a small MySQL database on a remote server and create a few test websites and run some benchmarks, this would give you some data on whether you are happy with the loading times.
I was wondering whether or not you are using SQL Server as your back-end or something else. I am pretty sure with SQL you can use SQL Replication http://technet.microsoft.com/en-us/library/ms151198.aspx to achieve the desired goal. At that point your local apps would access their own SQL instance, while each sql instance would "replicate" and "sync" its data with the main DB server. The end result is that your central DB will always be up to date and have aggregated data from each and every satelite SQL server. (Though please don't quote me on this one... I am not a SQL expert.)
(Sorry, I just realized you are using PHP/MySQL... and probably favor open source... However, I think this is worth looking into.)