Cassandra: Column Family vs Super Column Family

2019-03-21 13:25发布

问题:

I have a requirement where i need my database to store the following data:

- For each build, store the results of 3 performance runs. The result includes tps and latency. 

Reading up on cassandra data model, this directly maps to a super column family of the following format:

BenchmarkSuperColumnFamily= {

build_1: {
   Run1: {1000K, 0.5ms}
   Run2: {1000K, 0.5ms}
   Run3: {1000K, 0.5ms}
}

build_2: {
   Run1: {1000K, 0.5ms}
   Run2: {1000K, 0.5ms}
   Run3: {1000K, 0.5ms}
}
...

}

But, i read in the following answer that the use of Super Column family is discouraged. I wanted to know if there is a better way of creating a model for my requirement.

PS, I borrowed the JSONish notation from the following article

回答1:

The StackOverflow answer that you linked to is correct. You shouldn't be using SuperColumns in new applications. They exist for backwards compatibility however.

In general, composite columns can be used to mimic any model provided by super columns. Basically, they allow you to separate your column names into multiple parts. So if you were to specify a comparator of 'CompositeType(UTF8Type, UTF8Type)', your data model wound end up looking like this:

BenchmarkColumnFamily= {

   build_1: {
       (Run1, TPS) : 1000K
       (Run1, Latency) : 0.5ms
       (Run2, TPS) : 1000K
       (Run2, Latency) : 0.5ms
       (Run3, TPS) : 1000K
       (Run3, Latency) : 0.5ms
    }

    build_2: {
       ...
    }
...

}

With the above model you could use a single query to get a single data point for a single run, all data points for a single run, or all data points for multiple runs.

More info on composite columns: http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1