Data Modelling Advice for Blog Tagging system on G

2019-01-31 13:45发布

Am wondering if anyone might provide some conceptual advice on an efficient way to build a data model to accomplish the simple system described below. Am somewhat new to thinking in a non-relational manner and want to try avoiding any obvious pitfalls. It's my understanding that a basic principal is that "storage is cheap, don't worry about data duplication" as you might in a normalized RDBMS.

What I'd like to model is:

A blog article which can be given 0-n tags. Many blog articles can share the same tag. When retrieving data would like to allow retrieval of all articles matching a tag. In many ways very similar to the approach taken here at stackoverflow.

My normal mindset would be to create a many-to-may relationship between tags and blog articles. However, I'm thinking in the context of GAE that this would be expensive, although I have seen examples of it being done.

Perhaps using a ListProperty containing each tag as part of the article entities, and a second data model to track tags as they're added and deleted? This way no need for any relationships and the ListProperty still allows queries where any list element matching will return results.

Any suggestions on the most efficient way to approach this on GAE?

4条回答
迷人小祖宗
2楼-- · 2019-01-31 14:33

Many-to-many sounds reasonable. Perhaps you should try it first to see if it is actually expensive.

Good thing about G.A.E. is that it will tell you when you are using too many cycles. Profiling for free!

查看更多
够拽才男人
3楼-- · 2019-01-31 14:36

counts being pre-computed is not only practical, but also necessary because the count() function returns a maximum of 1000. if write-contention might be an issue, make sure to check out the sharded counter example.

http://code.google.com/appengine/articles/sharding_counters.html

查看更多
劫难
4楼-- · 2019-01-31 14:37

Thanks to both of you for your suggestions. I've implemented (first iteration) as follows. Not sure if it's the best approach, but it's working.

Class A = Articles. Has a StringListProperty which can be queried on it's list elements

Class B = Tags. One entity per tag, also keeps a running count of the total number of articles using each tag.

Data modifications to A are accompanied by maintenance work on B. Thinking that counts being pre-computed is a good approach in a read-heavy environment.

查看更多
聊天终结者
5楼-- · 2019-01-31 14:41

One possible way is with Expando, where you'd add a tag like:

setattr(entity, 'tag_'+tag_name, True)

Then you could query all the entities with a tag like:

def get_all_with_tag(model_class, tag):
    return model_class.all().filter('tag_%s =' % tag, True)

Of course you have to clean up your tags to be proper Python identifiers. I haven't tried this, so I'm not sure if it's really a good solution.

查看更多
登录 后发表回答