Why NoSQL is better at “scaling out” than RDBMS?

2019-01-16 09:21发布

问题:

I have read the following text in a technical blog discussing the advantages and disadvantages of NoSQL

" For years, in order to improve performance on database servers, database administrators have had to buy bigger servers as the database load increases (scaling up) instead of distributing the database across multiple “hosts” as the load increases (scaling out). RDBMS do not typically scale out easily, but the newer NoSQL databases are actually designed to expand easily to take advantage of new nodes and are usually designed with low-cost commodity hardware in mind. "

I became confused about the scalability of RDBMS and NoSQL.

My confusion are:

  1. Why RDBMS are less able to scale out? And the reason of buying bigger servers instead of buying more cheap ones.
  2. Why NoSQL is more able to scale out?

回答1:

RDBMS have ACID ( http://en.wikipedia.org/wiki/ACID ) and supports transactions. Scaling "out" with RDBMS is harder to implement due to these concepts.

NoSQL solutions usually offer record-level atomicity, but cannot guarantee a series of operations will succeed (transaction).

It comes down to: to keep data integrity and support transactions, a multi-server RDBMS would need to have a fast backend communication channel to synchronize all possible transactions and writes, while preventing/handling deadlock.

This is why you usually only see 1 master (writer) and multiple slaves (readers).



回答2:

So I've been trying to figure out the real bottom-line when it comes to NoSQL vs RDBMS myself, and always end up with a response that doesn't quite cut it. In my search there are really 2 primary differences between NoSQL and SQL, with only 1 being a true advantage.

  1. ACID vs BASE - NoSQL typically leaves out some of the ACID features of SQL, sort of 'cheating' it's way to higher performance by leaving this layer of abstraction to the programmer. This has already been covered by previous posters.

  2. Horizontal Scaling - The real advantage of NoSQL is horizontal scaling, aka sharding. Considering NoSQL 'documents' are sort of a 'self-contained' object, objects can be on different servers without worrying about joining rows from multiple servers, as is the case with the relational model.

Let's say we want to return an object like this:

post {
    id: 1
    title: 'My post'
    content: 'The content'
    comments: {
      comment: {
        id: 1
      }
      comment: {
        id: 2
      }
      ...

    views: {
      view: {
        user: 1
      }
      view: {
        user: 2
      }
      ...
    }
}

In NoSQL, that object would basically be stored as is, and therefore can reside on a single server as a sort of self-contained object, without any need to join with data from other tables that could reside on other DB servers.

However, with Relational DBs, the post would need to join with comments from the comments table, as well as views from the views table. This wouldn't be a problem in SQL ~UNTIL~ the DB is broken into shards, in which case 'comment 1' could be on one DB server, while 'comment 2' yet on another DB server. This makes it much more difficult to create the very same object in a RDBMS that has been scaled horizontally than in a NoSQL DB.

Would any DB experts out there confirm or argue these points?



回答3:

Typical RDBMSs make strong guaranties about consistency. This requires to some extend communication between nodes for every transaction. This limites the ability to scale out, because more nodes means more communication

NoSql systems make different trade offs. For example they don't guarantee that a second session will see immediately data commited by a first session. Thereby decoupling the transaction of storing some data from the process of making that data available for every user. Google "eventually consistent". So a single transaction doesn't need to wait for any (or for much less) inter node communication. Therefore they are able to utilize a large amount of nodes much more easily.



回答4:

For a NO SQL, 1.All the child related to a collection is at the same place and so on same server and there is no join operation to lookup data from another server .

2.There is no schema so no Locks needed on any server and the transaction handling is left to the clients .

The above 2 saves a lot of overhead of scaling in NO-SQL.