Inserting null values into cassandra

2019-06-22 14:51发布

问题:

I have some fields that I am storing into Cassandra, but some of them could be null at any given point. As there are quite a lot of them, it makes the code much more readable if I don’t check each one for null before adding it to the INSERT.

Is there any harm in doing so?

EDIT!!

There is a jira ticket that I found. But I am unable to understand what solution was finally implemented from the ticket. https://issues.apache.org/jira/browse/CASSANDRA-7304

回答1:

The beautiful thing about Cassandra's new storage engine is the ability to NOT store values. What it means is what it was meant to be: a null value is simply a value that should not be there.

This gives great flexibility, because a null value not explicitly (or implicitly, see later) inserted won't take storage space, nor use processing power and IO bandwidth.

Indeed, it is pretty easy to populate a row with null values:

INSERT INTO mytable (pk, c2, c3) VALUES (0x1234, null, null);

This way you are explicitly telling C* to store a null value in both c2 and c3. However, you could get the same macroscopic effect with:

INSERT INTO mytable (pk) VALUES (0x1234);

Notice that I say macroscopic effect, because when you explicitly insert a null value, C* will insert a tombstone nuder the hood. In the long run this will bite you, due to how C* perform searches, compactions, etc... so you should avoid whenever possible, the second version will perform much better.

Now, there is also a trap: you can also create tombstones implicitly. This will happen when you use the TTL features builtin in Cassandra.

In conclusion, if you care about yourself I'd suggest to NOT performing any null value inserts. Do a check at application level, you'll save time (and money) later, eg during reads.



回答2:

Inserting a null value creates a tombstone.
You should not create tombstone :
1. Tombstone take up space and can substantially increase the amount of storage you require.
2. Querying tables with a large number of tombstones causes performance problems and it causes Latency and heap pressure.