Implications of using /id for the partition key in

2020-08-10 19:42发布

问题:

In the scenario where we have 1000 entries (unique keys) entering cosmos per minute, is it safe to use /id as the partition key?

In particular, there is the concept of Logical Partitions https://docs.microsoft.com/en-us/azure/cosmos-db/partition-data The graphic here scares me a little bit, showing that the logical partitions are actual entities (Ex. "city": "London"). If I have an 8 hour TTL and 1000 entries per minute, I don't necessarily want 480,000 logical partitions that cosmos needs to manage.

What I imagine happens is that the value of the partition key is simply hashed and modulo with the number of physical partitions, ex. https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview#choose-partitionkey indicates that this is true in the "Logical Partition Mangement" section. Furthermore, the "Choosing a Partition Key" section suggests (but does not actually state) that /id would be a fantastic partition key, as it doesn't have to worry about the 10GB limit, throughput limit, no hot spots, wide (huge) range of values, and since the application doesnt need to filter on anything except the id, cross partition queries wont be an issue for this use case.

In summary, do I need to worry about the memory/CPU/etc overhead of hundreds of thousands of partition key values (logical partitions)? The docs indicate the more values of the partition key is better, but don't say if its possible to have too many values.

回答1:

I am from the Cosmos DB engineering team.

You don't have to worry about the number of logical partition keys that are created on a Cosmos DB collection/container. As long as the partition key is an appropriate choice for your writes (subject to a per-logical partition key cap of 10GB) and queries, you should be good.



回答2:

Implications are:

  1. best cardinality
  2. easy&fast&cheap document reads

  3. no transactions as transaction scope is partition key

  4. queries by anything other than id will be cross-partition

PS. I can hardly imagine the case for not needing anything but by id reads/queries. except maybe for document caching (combined with TTL).