GAE datastore list property serialization

2019-04-16 15:55发布

I've watched this video from Google I/O 2009: http://www.youtube.com/watch?v=AgaL6NGpkB8 where Brett shows microblogging example. He describes two datastore schemas:

first one:
class Message(db.Model):
    sender = db.StringProperty()
    body = db.TextProperty()
    receivers = db.StringListProperty()

and second one:
class Message(db.Model):
    author = db.StringProperty()
    message = db.TextProperty()

class MessageIndex(db.Model)
    receivers = db.StringListProperty()

And he says that in first example datastore has to serialize/deserialize receivers property every time we query messages by receiver, and in second exaple hasn't. I can't understand why datastore behaves differently in this examples, in both cases receivers is just StringListProperty. Can u explain that?

2条回答
可以哭但决不认输i
2楼-- · 2019-04-16 16:19

Note that there is a new feature, projection queries, which allows you to get a partial view of your entities, but ONLY on indexed properties.

https://developers.google.com/appengine/docs/python/datastore/projectionqueries

How it works internally, is that your entities, keys and indexes are all stored in different tables. If you get the whole entity, you have to do the lookup in the main entity table, which is expensive as it must deserialize the whole thing (and content with any other processes mucking about in that table).

A projection query is like a key-query, except instead of your entity key, it is using a set of indexed values as a key (because that's how the index tables operate internally). If you want a subset of data and can justify paying to have it indexed (or if it's already indexed), a projection query will be fast and cheap; it only looks in the index tables and doesn't need to touch the entity or key tables.

查看更多
Bombasti
3楼-- · 2019-04-16 16:22

In his talk, he's assuming that when you query, you want to retrieve the contents of the message - 'sender' and 'body'. In App Engine, entities are deserialized as a whole - you can't just load certain fields - so when you do a query in the first example, it has to load the entire list of receivers.

In the second example, you can do a keys-only query over MessageIndex, then fetch and load the corresponding Message entities. Because you never load any MessageIndex properties into memory, you don't need to deserialize the large and expensive listproperty associated with them.

查看更多
登录 后发表回答