Tag driven database for asynchron BI

I'm having a little problem coming up with an architecture for tag driven software I'm designing.

What I want to do is store plain text in database, which is liked to an owner and other entities. The plain text is filled with tags just like Twitters hashtags and should be searchable/indexable. That can be done application side and as a result I'm gonna have tons of small chunks of data that need to be processed for business intelligens.

No one is gonna ready the plain text it's only about the analysis which doesn't need to be consistent and can be run asynchron.

I know that Twitter uses several databases: Gizzard and Cassandra for tweets and FlockDb for relations.

I don't feel like using a hybrid to accomplish relations and I don't want to build the next social network either. What I need to do though is analytics over all tags in relation to other entities.

How can I solve the hash tag problem, or how can I process the text to make it work?

I'm really searching for a nice solution not just any solution. I really know how to create a schema for SQL.

Thanks for helping me through that database jungle.

标签： database-design tags nosql

1条回答

欢心

2楼-- · 2019-08-27 10:14

Here is a fairly standard way for representing tags in the database:

enter image description here

The application parses the RAW_TEXT, for each identified tag checks if it's already in TAG and inserts it there if it isn't. Then it just connects the post with the tags extracted from it, by inserting into TAG_POST.

Note the order of fields in the composite primary key of TAG_POST - having TAG_ID at the leading edge facilitates efficient searching for posts of a given tag. If you want to get tags of a given post, flip the order. If you want both, create a composite index in the opposite order from the PK.

If your DBMS supports clustering, TAG_POST would be a good candidate for it.

If your DBMS supports leading-edge index compression (Oracle), consider simply doing this:

enter image description here

A compressed index will store many repetitions of the same TAG_NAME efficiently, so there is no need to "outsource" it to another table just to save space.

0人赞添加讨论(0) 举报

Tag driven database for asynchron BI

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间