I'm a total App Engine newbie, and I want to confirm my understanding of the high replication datastore.
The documentation says that entity groups are a "unit of consistency", and that all data is eventually consistent. Along the same lines, it also says "queries across entity groups can be stale".
Can someone provide some examples where queries can be "stale"? Is it saying I could potentially save an entity without any parent (ie. it's own group), then query for it very soon after and not find it? Does it also imply that if I want data to be always 100% up-to-date I need to save them all in the same entity group?
Is the common workaround for this to use memcache to cache entities for a period of time longer than the average time it takes for data to become consistent across all data centers? What's the ballpark latency for that?
Thanks
Correct. Technically, this is the case for the regular Master-Slave datastore, too, as indexes are updated asynchronously, but in practice the window of time in which that could happen is so incredibly small you never see it.
If by "query" you mean "do a get by key", though, that will always return strongly consistent results in either implementation.
You'll need to define what you mean by "100% up-to-date" before it's possible to answer that.
No. Memcache is strictly for improving access times; you shouldn't use it in any situation where cache eviction will cause trouble.
Strongly consistent gets are always available to you if you need to guarantee that you're seeing the latest version. Without a concrete example of what you're trying to do, though, it's difficult to provide a recommendation.
Obligatory blog example setup;
Authors
havePosts
first thing to remember is that regular get/put/delete on a single entity group (including single entity) will work as expected:
You will only be able notice inconstancy if you start querying across multiple entity groups. Unless you have specified a
parent
attribute, all your entities are in separate entity groups. So if it was important that straight afterbob
creates a post, that he can see there own post then we should be careful with the following:fetched_posts
might contain the latestpost1
frombob
, but it might not. This is because all thePosts
are not in the same entity group. When querying like this in HR you should think "fetch me probably the latest posts for bob".Since it is important in our application that the author can see his post in the list straight after creating it, we will use the
parent
attribute to tie them together, and use anancestor
query to fetch the posts only from within that group:Now we know that
post2
will be in ourbobs_posts
results.If the aim of our query was to fetch "probably all the latest posts + definitely latest posts by bob" we would need to do another query.
Then merge the results
other_posts
andbobs_posts
together to get the desired result.Having just migrated my app over from the Master/Slave to the High Replication datastore, I have to say that in practice, eventual consistency isn't a problem for most applications.
Consider the classic guestbook example, where you
put()
a new guestbook post Entity and then immediately query all the posts in the guestbook. With the High Replication datastore, you won't see the new post appear in the query results until a few seconds later (at Google I/O, the Google engineers said that the lag was on the order of 2-5 seconds).Now, in practice, your guestbook app is probably doing an AJAX post of the new guestbook post entry. There is no need to refetch all the posts after submitting the new post. The webapp can simply insert the new entry into the UI once the AJAX request has succeeded. By the time the user leaves the webpage and returns to it, or even hits the browser refresh button, several seconds will have elapsed, and it is very likely that the new post will be returned by the query that pulls in all the guestbook posts.
Finally, note that the eventual consistency performance applies only to queries. If you
put()
an entity and immediately calldb.get()
to fetch it back, the result is strongly consistent, i.e. you will get the latest snapshot of the entity.