How does column-oriented NoSQL differ from documen

The three types of NoSQL databases I've read about is key-value, column-oriented, and document-oriented.

Key-value is pretty straight forward - a key with a plain value.

I've seen document-oriented databases described as like key-value, but the value can be a structure, like a JSON object. Each "document" can have all, some, or none of the same keys as another.

Column oriented seems to be very much like document oriented in that you don't specify a structure.

So what is the difference between these two, and why would you use one over the other?

I've specifically looked at MongoDB and Cassandra. I basically need a dynamic structure that can change, but not affect other values. At the same time I need to be able to search/filter specific keys and run reports. With CAP, AP is the most important to me. The data can "eventually" be synced across nodes, just as long as there is no conflict or loss of data. Each user would get their own "table".

标签： mongodb cassandra nosql

3条回答

在下西门庆

2楼-- · 2020-05-11 04:11

In "insert", to use rdbms words, Document-based is more consistent and straight foward. Note than cassandra let you achieve consistency with the notion of quorum, but that won't apply to all column-based systems and that reduce availibility. On a write-once / read-often heavy system, go for MongoDB. Also consider it if you always plan to read the whole structure of the object. A document-based system is designed to return the whole document when you get it, and is not very strong at returning parts of the whole row.

The column-based systems like Cassandra are way better than document-based in "updates". You can change the value of a column without even reading the row that contains it. The write doesn't actualy need to be done on the same server, a row may be contained on multiple files of multiple server. On huge fast-evolving data system, go for Cassandra. Also consider it if you plan to have very big chunk of data per key, and won't need to load all of them at each query. In "select", Cassandra let you load only the column you need.

Also consider that Mongo DB is written in C++, and is at its second major release, while Cassandra needs to run on a JVM, and its first major release is in release candidate only since yesterday (but the 0.X releases turned in productions of major company already).

On the other hand, Cassandra's designed was partly based on Amazon Dynamo, and it is built at its core to be an High Availibility solution, but that does not have anything to do with the column-based format. MongoDB scales out too, but not as gracefully as Cassandra.

0人赞添加讨论(0) 举报

狗以群分

3楼-- · 2020-05-11 04:15

In Cassandra, each row (addressed by a key) contains one or more "columns". Columns are themselves key-value pairs. The column names need not be predefined, i.e. the structure isn't fixed. Columns in a row are stored in sorted order according to their keys (names).

In some cases, you may have very large numbers of columns in a row (e.g. to act as an index to enable particular kinds of query). Cassandra can handle such large structures efficiently, and you can retrieve specific ranges of columns.

There is a further level of structure (not so commonly used) called super-columns, where a column contains nested (sub)columns.

You can think of the overall structure as a nested hashtable/dictionary, with 2 or 3 levels of key.

Normal column family:

row
    col  col  col ...
    val  val  val ...

Super column family:

row
      supercol                      supercol                     ...
          (sub)col  (sub)col  ...       (sub)col  (sub)col  ...
           val       val      ...        val       val      ...

There are also higher-level structures - column families and keyspaces - which can be used to divide up or group together your data.

See also this Question: Cassandra: What is a subcolumn

Or the data modelling links from http://wiki.apache.org/cassandra/ArticlesAndPresentations

Re: comparison with document-oriented databases - the latter usually insert whole documents (typically JSON), whereas in Cassandra you can address individual columns or supercolumns, and update these individually, i.e. they work at a different level of granularity. Each column has its own separate timestamp/version (used to reconcile updates across the distributed cluster).

The Cassandra column values are just bytes, but can be typed as ASCII, UTF8 text, numbers, dates etc.

Of course, you could use Cassandra as a primitive document store by inserting columns containing JSON - but you wouldn't get all the features of a real document-oriented store.

0人赞添加讨论(0) 举报

甜甜的少女心

4楼-- · 2020-05-11 04:20

The main difference is that document stores (e.g. MongoDB and CouchDB) allow arbitrarily complex documents, i.e. subdocuments within subdocuments, lists with documents, etc. whereas column stores (e.g. Cassandra and HBase) only allow a fixed format, e.g. strict one-level or two-level dictionaries.

0人赞添加讨论(0) 举报

How does column-oriented NoSQL differ from documen

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间