Real-time unidirectional synchronization from sql-

2020-05-07 06:59发布

问题:

In my previous question on this portal, I had asked about some insight about syncing data between SQL Server and key-value based data repositories.

In lieu of the same problem (one way real-time synchronization from SQL to HBase or any other database), I need to take care of some performance and latency considerations and did not find a very foolproof way of doing it.

  1. We have multiple SQL 2008 data shards where data is updated from various sources and processed by many processes at the same time (and UI reads from the same shards).

  2. The goal is to get all updates in selected tables at any point of time and transport them to other data source in almost real-time fashion.

  3. Amount of change in SQL shards will stays in 100-500 MB range (if we keep the frequency of 1 min). We do not want to make major changes in SQL servers as we would discard it after migrating complete system.

  4. Unfortunately, our data access layer is messed up in various layers. Otherwise, it would have been the best way to fork the updates in two directions.

  5. Triggers will slow down the shards and leave them in an unresponsive state.

  6. Not sure if SQL Server 2008 has something similar to SQL Server 2005 Notification services and how effective that would be.

Any other innovative solution would greatly helpful.

Here my problem is not about transforming the data from relational to Key-value form (it's fairly easy), but how to get the SQL Server updates in real-time (can afford the latency of 1-2 minutes) without affecting the user experience.

回答1:

have you looked at SQL Service Broker? here's a link with some info on it: http://blogs.msdn.com/b/sql_service_broker/archive/2008/07/09/real-time-data-integration-with-service-broker-and-other-sql-techniques.aspx



回答2:

There are layers of data from bottom to top: storage, file system, db and app.

The most efficient way of doing this is using storage replication. It almost has no impact on performance, can be configured as sync or async, and is not free. You may google SRDF or MirrorView for a concept of it.

Then you can have a look at file system replication. It's similar to storage replication but happens in OS/file system layer, consuming resources(CPU, IO, mem) of the host system. You may google symantec storage foundation for more information.

At DB level, you can do database replication / log shipping to replicate the data. SQL server has such facilities.

The "cheapest" solution would be modify your app, like your 4), but I suggest you use a message queue to replicate the data to minimize the impact on performance.



回答3:

One option you might want to look into is SQL Server Integrated Change Tracking (part of SQL2008 or higher). This is an incredibly efficient way of finding the changes that have occurred in your SQL Server database (including deletes), has very little impact on your SQLDB, does not require triggers and provides a good way of allowing you to then move the data changes to Hadoop.

Full disclosure, I work on Cotega and this data sync is something we are making a big focus on. I am happy to help more if this is a direction you are interested in taking.