Azure Table Partitioning Strategy

2019-07-23 07:27发布

问题:

I am trying to come up with a partition key strategy based on a DateTime that doesn't result in the Append-Only write bottleneck often described in best practices guidelines.

Basically, if you partition by something like YYYY-MM-DD, all your writes for a particular day will end up the same partition, which will reduce write performance.

Ideally, a partition key should even distribute writes across as many partitions as possible.

To accomplish this while still basing the key off a DateTime value, I need to come up with a way to assign what amounts to buckets of dateline values, where the number of buckets is predetermined number per time interval - say 50 a day. The assignment of a dateline to a bucket should be as random as possible - but always the same for a given value. The reason for this is that I need to be able to always get the correct partition given the original DateTime value. In other words, this is like a hash.

Lastly, and critically, I need the partition key to be sequential at some aggregate level. So while DateTime values for a given interval, say 1 day, would be randomly distributed across X partition keys, all the partition keys for that day would be between a queryable range. This would allow me to query all rows for my aggregate interval and then sort them by the DateTime value to get the correct order.

Thoughts? This must be a fairly well known problem that has been solved already.

回答1:

How about using the millisecond component of the date time stamp, mod 50. That would give you your random distribution throughout the day, the value itself would be sequential, and you could easily calculate the PartitionKey in future given the original timestamp ?



回答2:

To add to Eoin's answer, below is the code I used to simulate his solution:

   var buckets = new SortedDictionary<int,List<DateTime>>();
   var rand = new Random();

   for(int i=0; i<1000; i++)
   {
          var dateTime = DateTime.Now; 
          var bucket = (int)(dateTime.Ticks%53);
          if(!buckets.ContainsKey(bucket))
                 buckets.Add(bucket, new List<DateTime>());

          buckets[bucket].Add(dateTime);
          Thread.Sleep(rand.Next(0, 20));
   }

So this should simulate roughly 1000 requests coming in, each anywhere between 0 and 20 milliseconds apart.

This resulted in a fairly good/even distribution between the 53 "buckets". It also resulted, as expected, in avoiding the append-only or prepend-only anti-pattern.