Where would a scaled relational DB fall in the CAP

2019-08-26 07:42发布

If you have scaled SQL server with one DB for writes and multiple DBs for reads. Wouldn't there be a delay for data to be replicated from the write DB to the to other read databases? In which case isn't the data inconsistent?

So where would a scaled relational DB fall in the CAP theorem?

Update:

In relational DBs consistency means there wont be partial updates. For example if someone transfers money from one account to another and the whole thing is a part of one transaction, it wont happen that you take money out of one account but doesn't show up in another account.

In CAP theorem consistence means all the components see the same data. That consistency is different from consistency in ACID.

From what I know, relational DBs like SQL server are supposed to be CA (consistent and available). This would make sense if there is just one database. Because everyone would see the same data. But what if the SQL server is scaled with multiple databases? In that case would all databases still see the same data? If not, would it be consistent (in CAP theorem)?

My feeling is a scaled relational DB is AP (Available and partition tolerant) and not CA (Consistent and available).

1条回答
来,给爷笑一个
2楼-- · 2019-08-26 08:29

I've read different definitions of consistency in regards to the CAP theorem.

  1. Some definitions of consistency say that once some data is persisted in a system, all reads will read the most recently written data. In this definition, a replicated database (you call this "scaled" but I wouldn't use that term) has a risk of returning inconsistent data, if the replication is asynchronous.

    To mitigate this risk, some systems make sure replication is synchronous, or as close to synchronous as they can implement. Galera, for example, sends transaction write sets to its replicas synchronously. If you try to read from the replica, and it detects that there are write sets pending but not yet applied, it can block your read until it has caught up with the pending write sets (this behavior is configurable). So you'll never read data that is out of date.

    The cost of maintaining perfectly consistent reads over distributed systems in this manner is usually more expensive than users want. It will become a performance bottleneck in a system that has a high rate of updates. So for practical reasons, most projects accept that "replication lag" is a necessary compromise.

  2. Other definitions of consistency are closer to atomicity, i.e. transactions will not be persisted in a partially-complete state. So all constraints will be satisfied when you read the data, whether you read the data before or after the transaction is applied. In this definition, it's quite easy to imagine the replica database instance remaining consistent, if it applies updates using the same transaction semantics used on the master. If you read data from the replica, you might read data that hasn't yet had the latest updates applied, but it will never be in an inconsistent state with respect to constraints.

查看更多
登录 后发表回答