Cassandra column key auto increment

2019-01-23 13:16发布

I am trying to understand Cassandra and how to structure my column families (CF) but it's quite hard since I am used to relational databases.

For example if I create simple users CF and I try to insert new row, how can I make an incremental key like in MySQL?

I saw a lot of examples where you would just put the username instead of unique ID and that would make a little sense, but what if I want users to have duplicated usernames?

Also how can I make searches when from what I understand cassandra does not suport > operators, so something like select * from users where something > something2 would not work.

And probably the most important question what about grouping? Would I need to retrieve all data and then filter it with whatever language I am using? I think that would slow down my system a lot.

So basically I need some brief explanation how to get started with Cassanda.

3条回答
仙女界的扛把子
2楼-- · 2019-01-23 13:33

Your questions are quite general, but let me take a stab at it. First, you need to model your data in terms of your queries. With an RDBMS, you model your data in some normalized form, then optimize later for your specific queries. You cannot do this with Cassandra; you must write your data the way you intend to read it. Often this means writing it more than one way. In general, it helps to completely shed your RDBMS thinking if you want to work effectively with Cassandra.

Regarding keys:

  • They are used in Cassandra as the unit of distribution across the ring. So your key will get hashed and assigned an "owner" in the ring. Use the RandomPartitioner to guarantee even distribution

  • Presuming you use RandomPartitioner (you should), keys are not sorted. This means you cannot ask for a range of keys. You can, however, ask for a list of keys in a single query.

  • Keys are relevant in some models and not in others. If your model requires query-by-key, you can use any unique value that your application is aware of (such as a UUID). Sometimes keys are sentinel values, such as a Unix epoch representing the start of the day. This allows you to hand Cassandra a bunch of known keys, then get a range of data sorted by column (see below).

Regarding query predicates:

  • You can get ranges of data presuming you model it correctly to answer your queries.

  • Since columns are written in sorted order, you can query a range from column A to column n with a slice query (which is very fast). You can also use composite columns to abstract this mechanism a bit.

  • You can use secondary indexes on columns where you have low cardinality--this gives you query-by-value functionality.

  • You can create your own indexes where the data is sorted the way you need it.

Regarding grouping:

I presume you're referring to creating aggregates. If you need your data in real-time, you'll want to use some external mechanism (like Storm) to track data and constantly update your relevant aggregates into a CF. If you are creating aggregates as part of a batch process, Cassandra has excellent integration with Hadoop, allowing you to write map/reduce jobs in Pig, Hive, or directly in your language of choice.

查看更多
▲ chillily
3楼-- · 2019-01-23 13:33

You may want to check out PlayOrm. While I agree you need to break out of RDBMS thinking sometimes having your primary key as userid is just the wrong choice. Sometimes it is the right choice(depends on your requirements).

PlayOrm is a mix of noSQL and relational concepts as you need both and you can do Scalable-SQL with joins and everything. You just need to partition the tables you believe will grow into the billions/trillions of rows and you can query into those partitions. Even with CQL, you need to partition your tables. What can you partition by? time is good for some use-cases. Others can be partitioned by clients as each client is really a mini-database in your noSQL cluster.

As far as keys go, PlayOrm generates unique "cluster" keys which is hostname-uniqueidinThatHost, basically like a TimeUUID except quite a bit shorter and more readable as we use hostnames in our cluster of a1, a2, a3, etc. etc.

查看更多
女痞
4楼-- · 2019-01-23 14:00

To your first question:

can i make incremental key like in mysql

No, not really -- not native to Cassandra. How to create auto increment IDs in Cassandra -- You could check here for more information: http://srinathsview.blogspot.ch/2012/04/generating-distributed-sequence-number.html

Your second question is more about how you store and model your Cassandra data.

Check out stackoverflow's search option. Lots of interesting questions!

  1. Switching from MySQL to Cassandra - Pros/Cons?
  2. Cassandra Data Model
  3. Cassandra/NoSQL newbie: the right way to model?
  4. Apache Cassandra schema design
  5. Knowledge sources for Apache Cassandra

Most importantly, When NOT to use Cassandra?

查看更多
登录 后发表回答