Is cassandra unable to store relationships that cr

2019-02-28 06:14发布

问题:

I've noticed that relationships cannot be properly stored in C* due to its 100MB partition limit, denormalization doesn't help in this case and the fact that C* can have 2B cells per partition neither as those 2B cells of just Longs have 16GB ?!?!? Doesn't that cross 100MB partition size limit ?

Which is what I don't understand in general, C* proclaims it can have 2B cells but a partition sizes should not cross 100MB ???

What is the idiomatic way to do this? People say that this an ideal use case for TitanDB or JanusDB that scale well with billions of nodes and edges. How do these databases that use C* under the hood data-model this?

The use case of mine is described here https://groups.google.com/forum/#!topic/janusgraph-users/kF2amGxBDCM

Note that I'm fully aware of the fact that the answer to this question is "use extra partition key to decrease partition size" but honestly, who of us has this possibility? Especially in modeling relationships ... I'm not interested in relationship that happened in a particular hour...

回答1:

Maximum number of cells (rows x columns) in a partition is 2 billion and single column value size is 2 GB ( 1 MB is recommended)

Source : http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html

Partition size 100MB is not the upper limit. If you check the datastax doc

For efficient operation, partitions must be sized within certain limits in Apache Cassandra™. Two measures of partition size are the number of values in a partition and the partition size on disk. Sizing the disk space is more complex, and involves the number of rows and the number of columns, primary key columns and static columns in each table. Each application will have different efficiency parameters, but a good rule of thumb is to keep the maximum number of rows below 100,000 items and the disk size under 100 MB

You can see that for efficient operation and low heap pressure they just made a good rule of thumb is to keep number of row 100,000 and disk size 100MB in a single partition.


TitanDB or JanusDB stores graphs in adjacency list format which means that a graph is stored as a collection of vertices with their adjacency list. The adjacency list of a vertex contains all of the vertex’s incident edges (and properties).

They used VertexID is the partition key, PropertyKeyID or EdgeID as clustering key and property value or edge properties as normal column.

If you use cassandra as storage backend. In TitanDB or JanusDB, For efficient operation and low heap pressure, same rule apply, means number of edge and property of a vertex is 100,000 and size 100MB