How can I store the date with datastore?

2019-04-29 20:48发布

Datastore documentation is very clear that there is an issue with "hotspots" if you include 'monotonically increasing values' (like the current unix time), however there isn't a good alternative mentioned, nor is it addressed whether storing the exact same (rather than increasing values) would create "hotspots":

"Do not index properties with monotonically increasing values (such as a NOW() timestamp). Maintaining such an index could lead to hotspots that impact Cloud Datastore latency for applications with high read and write rates." https://cloud.google.com/datastore/docs/best-practices

I would like to store the time when each particular entity is inserted into the datastore, if that's not possible though, storing just the date would also work.

That almost seems more likely to cause "hotspots" though, since every new entity for 24 hours would get added to the same index (that's my understanding anyway).

Perhaps there's something more going on with how indexes work (I am having trouble finding great explanations of exactly how they work) and having the same value index over and over again is fine, but incrementing values is not.

I would appreciate if anyone has an answer to this question, or else better documentation for how datastore indexes work.

3条回答
走好不送
2楼-- · 2019-04-29 21:03

Is your application actually planning on querying the date? If not, consider simply not indexing that property. If you only need to read that property infrequently, consider writing a mapreduce rather than indexing.

That advice is given due to the way BigTable tablets work, which is described here: https://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/

To the best of my knowledge, it's more important to have the primary key of an entity not be a monotonically increasing number. It would be better to have a string key, so the entity can be stored with better distribution.

But saying this as a non-expert, I can't imagine that indexes on individual properties with monotonic values would be as problematic, if it's legitimately needed. I know with the Nomulus codebase for example, we had a legitimate need for an index on time, because we wanted to delete commit logs older than a specific time.

One cool thing I think happens with these monotonic indexes is that, when these tablet splits don't happen, fetching the leftmost or rightmost element in the index actually has better latency properties than fetching stuff in the middle of the index. For example, if you do a query that just grabs the first result in the index, it can actually go faster than a key lookup.

查看更多
对你真心纯属浪费
3楼-- · 2019-04-29 21:10

There is a key quote in the page that Justine linked to that is very helpful:

As a developer, what can you do to avoid this situation? ... Lower your write rate, or figure out how to better distribute values.

It is ok to store an indexed time stamp as long as that entity has a low write rate.

If you have an entity where you want to store an indexed time stamp and the entity has a high write rate, then the solution is to split the entity into two entities. Entity A will have properties that need to be updated frequently and entity B will have the time stamp and properties that don't get updated often.

When I do this, I have a common ID for the two entities to make it really easy to get from one to the other.

查看更多
再贱就再见
4楼-- · 2019-04-29 21:23

You could try storing just the date and put random hours, minutes, and seconds into the timestamp, then throw away that extra data later. (Or keep the hours and minutes and use random seconds, for example). I'm not 100% sure this would work but if you need to index the date it's worth trying.

查看更多
登录 后发表回答