My question is about modelling one-to-many relations in ndb. I understand that this can be done in (at least) two different ways: with a repeated property or with a 'foreign key'. I have created a small example below. Basically we have an Article which can have an arbitrary number of Tags. Let's assume that a Tag can be removed but cannot be changed after it has been added. Let's also assume that we don't worry about transactional safety.
My question is: what is the preferred way of modelling these relationships?
My considerations:
- Approach (A) requires two writes for every tag that is added to an article (one for the Article and one for the Tag) whereas approach (B) only requires one write (just the Tag).
- Approach (A) leverages ndb's caching mechanism when fetching all Tags for an Article whereas in case of approach (B) a query is required (and additionally some custom caching)
Are there some things that I'm missing here, any other considerations that should be taken into account?
Thanks very much for your help.
Example (A):
class Article(ndb.Model):
title = ndb.StringProperty()
# some more properties
tags = ndb.KeyProperty(kind="Tag", repeated=True)
def create_tag(self):
# requires two writes
tag = Tag(name="my_tag")
tag.put()
self.tags.append(tag)
self.put()
def get_tags(self):
return ndb.get_multi(self.tags)
class Tag(ndb.Model):
name = ndb.StringProperty()
user = ndb.KeyProperty(Kind="User") # User that created the tag
# some more properties
Example(B):
class Article(ndb.Model):
title = ndb.StringProperty()
# some more properties
def create_tag(self):
# requires one write
tag = Tag(name="my_tag", article=self.key)
tag.put()
def get_tags(self):
# obviously we could cache this query in memcache
return Tag.gql("WHERE article :1", self.key)
class Tag(ndb.Model):
name = ndb.StringProperty()
article = ndb.KeyProperty(kind="Article")
user = ndb.KeyProperty(Kind="User") # User that created the tag
# some more properties
Approach (A) should be preferred in most situations. While there are two writes required to add a tag, this is probably much less frequent than reading the tags. As long as you don't have a huge number of tags, they should all fit into the repeated Key property.
As you mentioned, fetching the tags by their keys is much faster than performing a query. Also, if you only need the tag's name and the user, you could create the tag with the
User
as the parent key and theName
as the tag's id:To create this tag, you would use:
Then when you retrieve the tags,
Then, each key you stored in
Article.tags
would contain the User key and theTag
name! This would save you from reading in theTag
to get those values.Have you looked at the following about using
Structured Properties
https://developers.google.com/appengine/docs/python/ndb/properties#structured . The short discussion there aboutContact
andAddresse
may simplify your problem. Also look at https://developers.google.com/appengine/docs/python/ndb/queries#filtering_structured_properties. The discussions are very short.Also, looking ahead to the fact that joins are not allowed, option
A
looks better.As stated before, there are no joins in Datastore, so all the "Foreign Key" notion doesn't apply. What can be done is to use the Query class to query your datastore for the correct Tag.
For example, if you are using Endpoints, then:
And the during the request do: