My understanding that rows are overwritten when another row with identical primary keys is inserted.
For example:
I have columns (user_id int, item_id int, site_id int)
, and my PRIMARY KEY(user_id, item_id)
If I had the following table:
user_id, item_id, site_id
2 3 4
and I insert user_id : 2, item_id : 3, site_id : 10
, my new table would be:
user_id, item_id, site_id
2 3 10
not
user_id, item_id, site_id
2 3 4
2 3 10
Is this simple case hold in all cases? Are any subtleties that I likely not aware off? Also, I could not find this in the docs and came to this conclusion by playing around with cassandra, can anyone provide a doc source?
Yes, this is how Cassandra is designed to operate. In all cases where an
UPDATE
orINSERT
is executed, data will be updated (based on the keys) if it exists, and inserted it it does not. An important point to remember, is that under the hood,UPDATE
andINSERT
are synonymous. If you think about those two as being the same, then you can start to understand why it works the way that it does.That being said, you are correct, in that you do have to look closely to find an explicit reference to this behavior in the documentation. I found the closest references in the docs and listed them below:
From the UPDATE documentation:
From the INSERT documentation:
Now while these excerpts may not come right out and say "be careful not to overwrite", I did manage to find an article on Planet Cassandra that was more explicit: How to Do an Upsert in Cassandra