Designing record keys for document-oriented databa

2020-07-17 06:21发布

问题:

Our team has started development of an application backed by Couchbase DB; for every one of us it's the first experience with a no-SQL database.

We've started to define our entities and adopted the practice of using "type" prefixes, suggested by Couchbase manual:

Entity "A":
key: a#123

Entity "B":
key: b#123

But we realized that we're getting confused with choosing the strategy for creating compound document keys. We use counters a lot, and they require their own documents. Our keys have become complicated:

Daily counter "x" for entity "A":
key: cntrx#a#123-20140117

We've considered different approaches, but we're still greenhorns on the subject and would like to ask some advice.

Are hierarchical keys any good at all? Can anyone share their best practices for defining non-trivial keys?

回答1:

In our project we used hierarchical keys in a way described below: First part of a key is something like table name from RDBMS: users - represents "table"

Then each user has it's own id in example:

users:1 - "represents one user"

We used ':', because I think it just looks nicer than other delimiters. You can use any delimiter you like.

If you want to use sequential indexes like id in previous example, you'll need get them from some key, so:

users:counter - key that holds "last user id" (it acts like autoincrement)

If you need to store some "subsection" for user account, you can store it:

users:<user's id>:subsection.

More complex example

users:1:avatars:1:url - means that by this key we will get avatar url of user 1, but if user wants to store many avatars they will go under users:1:avatars:X:url, where X is will be a value of users:1:avatars:counter key.

We used this strategy for all docs, that store only one value, JSON or even binary data.

So exactly for your example, I'll chose:

a:123-20140117:counter - that will mean that we have (speaking in RDBMS language) table named "a", in table "a" we have record with id (or something else) "123-20140117" that has field "cntrx".

UPD: About key size. Actually it doesn't matter. Yes keys are limited in size, but there are a lot of ways to reduce it. One of them - use hashes, but I think it's bad way, because keys will be long and consume more memory. In our project we used "short" keys for memcached bucket. We had a enum (that can be also stored in couchbase) that represent human understandable key name and it's shorten value.

Example: we have some set of records: list of users that have more than 30 photos. So we have a key-value pair:

usersByPhotosCount - k:ubpc:{0}

and for 30 photos key will be k:ubpc:30.

But it's better to do such optimizations only on production. In development it's better to have understandable keys in app and database (i.e. you can create two sets of k-v pairs: normal for development, shorten and obfuscated for production and load them depending on your environment).



回答2:

I have a couple of things to suggest regarding your question.

Overall

Nosql is just as it sounds - and requires a much different mindset than previously used to design good SQL databases. For instance, a nosql database is basically a big hash-map. So, while it might be good to put thought into your keys (e.g. to make them small), remember that they are just a means to access your documents. Unless there is some specific advantage derived from having them look a certain way, they don't need to mean anything at all - there is generally always a primary lookup required first. Case in point, how often are your users going to know they need to ask for "b#123" directly upon navigating to your app? The only place I can think of this being advantageous is in a username or some other piece of data that the user would know.

Compound Keys

While the CB manual may suggest that compound keys are a good idea (and they very well might be for simple database structures), in general, key size should be as small as possible. Keys are limited to a maximum of 256 bytes. All of the keys must be stored in RAM - so the more data is in your keys, the less will be available for the rest of your data. Instead, I would suggest creating a type field in your document, then using a view to pull out objects of a specific type (or to index objects by type). This will ultimately give you greater flexibility going forward.

Counters

Your explanation of counters is rather vague, so I am making the assumption that you are using them as an auto-increment key. I would suggest that the approach needs to be changed here to get away from counters. I use unique identifiers for all keys in my database. When I do use a compound key, it is because the key itself is significant (e.g. in revision-controlled documents, I use a compound key of the document id + the date the document was saved to ensure it is unique). Even if you have several million (or even billions) of objects, you could use 12 bytes of a GUID to practically guarantee uniqueness in your document ids. This prevents a really bad bottleneck in your application when needing to save new records.