Data Modelling Advice for Blog Tagging system on G

Am wondering if anyone might provide some conceptual advice on an efficient way to build a data model to accomplish the simple system described below. Am somewhat new to thinking in a non-relational manner and want to try avoiding any obvious pitfalls. It's my understanding that a basic principal is that "storage is cheap, don't worry about data duplication" as you might in a normalized RDBMS.

What I'd like to model is:

A blog article which can be given 0-n tags. Many blog articles can share the same tag. When retrieving data would like to allow retrieval of all articles matching a tag. In many ways very similar to the approach taken here at stackoverflow.

My normal mindset would be to create a many-to-may relationship between tags and blog articles. However, I'm thinking in the context of GAE that this would be expensive, although I have seen examples of it being done.

Perhaps using a ListProperty containing each tag as part of the article entities, and a second data model to track tags as they're added and deleted? This way no need for any relationships and the ListProperty still allows queries where any list element matching will return results.

Any suggestions on the most efficient way to approach this on GAE?

标签： python google-app-engine bigtable data-modeling

4条回答

迷人小祖宗

2楼-- · 2019-01-31 14:33

Many-to-many sounds reasonable. Perhaps you should try it first to see if it is actually expensive.

Good thing about G.A.E. is that it will tell you when you are using too many cycles. Profiling for free!

0人赞添加讨论(0) 举报

够拽才男人

3楼-- · 2019-01-31 14:36

counts being pre-computed is ~~not only~~ practical~~, but also necessary because the count() function returns a maximum of 1000~~. if write-contention might be an issue, make sure to check out the sharded counter example.

http://code.google.com/appengine/articles/sharding_counters.html

0人赞添加讨论(0) 举报

劫难

4楼-- · 2019-01-31 14:37

Thanks to both of you for your suggestions. I've implemented (first iteration) as follows. Not sure if it's the best approach, but it's working.

Class A = Articles. Has a StringListProperty which can be queried on it's list elements

Class B = Tags. One entity per tag, also keeps a running count of the total number of articles using each tag.

Data modifications to A are accompanied by maintenance work on B. Thinking that counts being pre-computed is a good approach in a read-heavy environment.

0人赞添加讨论(0) 举报

聊天终结者

5楼-- · 2019-01-31 14:41

One possible way is with Expando, where you'd add a tag like:

setattr(entity, 'tag_'+tag_name, True)

Then you could query all the entities with a tag like:

def get_all_with_tag(model_class, tag):
    return model_class.all().filter('tag_%s =' % tag, True)

Of course you have to clean up your tags to be proper Python identifiers. I haven't tried this, so I'm not sure if it's really a good solution.

0人赞添加讨论(0) 举报

Data Modelling Advice for Blog Tagging system on G

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间