Sorry if this question is too simple; I'm only entering 9th grade.
I'm trying to learn about NoSQL database design. I want to design a Google Datastore model that minimizes the number of read/writes.
Here is a toy example for a blog post and comments in a one-to-many relationship. Which is more efficient - storing all of the comments in a StructuredProperty or using a KeyProperty in the Comment model?
Again, the objective is to minimize the number of read/writes to the datastore. You may make the following assumptions:
- Comments will not be retrieved independently of their respective blog post. (I suspect that this makes the StructuredProperty most preferable.)
- Comments will need to be sortable by date, rating, author, etc. (Subproperties in the datastore cannot be indexed, so perhaps this could affect performance?)
- Both blog posts and comments may be edited (or even deleted) after they are created.
Using StructuredProperty:
from google.appengine.ext import ndb
class Comment(ndb.Model):
various properties...
class BlogPost(ndb.Model):
comments = ndb.StructuredProperty(Comment, repeated=True)
various other properties...
Using KeyProperty:
from google.appengine.ext import ndb
class BlogPost(ndb.Model):
various properties...
class Comment(ndb.Model):
blogPost = ndb.KeyProperty(kind=BlogPost)
various other properties...
Feel free to bring up any other considerations that relate to efficiently representing a one-to-many relationship with regards to minimizing the number of read/writes to the datastore.
Thanks.
I could be wrong, but from what I understand, a StructuredProperty is just a property within an entity, but with sub-properties.
This means reading a BlogPost and all its comments would only cost one read. So when you render your page, you only need one read op for your entire page.
Writes would be cheaper each too. You'll need one read op to get the BlogPost, and as long as you don't update any indexed properties, it'll just be one write op.
You can handle the comment sorting on your own after you read the entity out of the datastore.
You'll have to synchronize your comment updates/edits with transactions, to make sure one comment doesn't overwrite another, since they are both modifying the same entity. You may run into unsolveable problems if everyone is commenting and editing the same blog post at the same time.
In optimizing for cost though, you'll hit a wall with the maximum entity size of 1MB. This will limit the number of comments you can store per blog post.
Going with the KeyProperty would be quite a bit more expensive.
You'll need one read to get the blog post, plus 1 query plus 1 small read op for each comment.
Every comment is a new entity, so it'll be at least 4 write ops. You may want to index for sort order, so that'll end up costing even more write ops.
On the plus side, you'll have unlimited comments per blog post, you don't have to worry about synchronizing new comments. You might need to worry about synchronization for editing comments, but if you limit the edit to the creator, that shouldn't really be a problem. You don't have to do sorting yourself either.
It's a cost vs features tradeoff.
What about:
from google.appengine.ext import ndb
class Comment(ndb.Model):
various properties...
class BlogPost(ndb.Model):
comments = ndb.KeyProperty(Comment, repeated=True)
various other properties...
This way, you can store up to 5000 comments per blog post (the maximum number of repeated properties) independent of the size of each blog post. You won't need a query to fetch the blogs for a comment, you can just do ndb.get_multi(blog_post.comments)
. And for this operation, you can try to rely on ndb's memcache. Of course, it depends on your use case whether this is a good assumption or not.
Be aware of this caveat when using a repeated StructuredProperty:
Do not use repeated properties if you have more than 100-1000 values. (1000 is probably already pushing it.) They weren't designed for such use.
See Guido's answer in GAE ndb design, performance and use of repeated properties.
So while you may not hit the 1 MB entity limit with StructuredProperty, you may easily hit the 100-1000 suggested max.