I am designing google datastore schema for the classic 'User Posts' and 'Tags' questions.
This page suggests Relation Index Entities model. Basically it puts searchable tags or keywords as list property in child entity for filtering, and the necessary properties in parent entity. To my understanding, this approach is to reduce serialization overhead at query time.
class Post(db.Model):
title = db.StringProperty()
post_date = db.DateTimeProperty()
class Tags(db.Model):
tags = db.StringListProperty()
mytags = Tags(parent=post, tags=many_tags)
- Given projection queries can get a subset of properties, is Relation Index Entities still necessary to reduce serialization overhead of list properties?
Note: projection query has limits; Relation Index Entity doesn't.
Does Relation Index Entities make query more difficult? Saying I want to filter on the post with tag 'cars' for the posts created within last 7 days. tags and post_date are in different kinds, is there an easy way to do that?
Regarding exploding indexes, does Relation Index Entities reduce the chance of exploding indexes, since it put list properties in different kinds?
Thanks for answering in advance.
The Relation Index Entity solution reduces the serialization overhead at any type of access to the
Post
entities, including ops likekey.get()
, `entity.put() or fetching non-projection queries, while projection queries only do that for, well, fetching the respective query results.Yes, queries are a bit more difficult. For your example you'll need separate queries, one for each entity kind.
The example assumes using
ndb
, notdb
:I'd use keys-only queries as they're cheaper and faster:
tags
- and a small number of otherPost
properties, all with single values, so the difference in exploding index impact would probably be neglijible.Splitting an entity in several smaller ones is a common technique used for other reasons as well, see, for example, re-using an entity's ID for other entities of different kinds - sane idea?.
Here's an example of applying this idea here:
The examples are rather simplistic, they can be optimized using
ndb
async calls, tasks/tasklets, cursors may be needed for many results, etc.