Cell versioning with Cassandra

2019-05-17 01:17发布

问题:

My application uses an AbstractFactory for the DAO layer so once the HBase DAO family has been implemented, It would be very great for me to create the Cassandra DAO family and see the differences from several points of view.
Anyway, trying to do that, I saw Cassandra doesn't support cell versioning like HBase (and my application makes a strong usage of that) so I was wondering if there are some table design trick (or something else) to "emulate" this behaviour in Cassandra

回答1:

One common strategy is to use composite column names with two components: the normal column name, and a version. What you use for the version component depends on your access patterns. If you might have updates coming from multiple clients simultaneously, then using a TimeUUID is your safest option. If only one client may update at a time, you can use something smaller, like a timestamp or version number.

Assuming you use version numbers for simplicity, here's what that might look like for storing documents with versioned fields:

| ('body', 5) | ('body', 4) | ... | ('title', 1) | ('title', 0) |
|-------------|-------------|-----|--------------|--------------|
| 'Neque ...' | 'Dolor ...' | ... | 'Lorem Ipsum'| 'My Document'|

This format is primarily useful if you want a specific version of a field, all versions of a field, or all versions of all fields.

If you also want to support efficiently fetching the latest version of all fields at once, I suggest you denormalize and add a second column family where only the latest version of each field is store in its normal form. You can blindly overwrite these fields for each change. Continuing our example, this column family would look like:

|   'body'    |    'title'    |
|-------------|---------------|
| 'Neque ...' | 'Lorem Ipsum' |