Database design for Tagging multiple types of enti

2019-03-10 11:12发布

问题:

I'm currently designing a database schema that's used to store recipes. In this database there are different types of entities that I want to be able to tag (ingredients, recipe issuers, recipes, etc). So a tag has multiple n:m relations. If I use the "three table design", this would result in tables (cross table) for every entity type (recipes, ingredients, issuers) that I have. In other words every time I introduce an entity I have to add a cross table for it.

I was thinking of creating one table which has a unique id, that all the entities refer to, and a n:m relation between the tags table and the "unique id"-table. This way there is just one cross table between the "unique id"-table and the tag table.

Just in case that some people will think this question already was asked. I already read Database Design for Tagging. And there the three table design is mentioned.

回答1:

I would say it depends on how you want to use the tags.

I would imagine you could create an additional intersection table for each entity type you want to tag, if you only search one type of entity at a time. In other words, it would be normal to say, "show me the ingredients with tag 'yummy'" but it's not clear what it would mean to say, "show me both ingredients and recipe issuers with tag 'yummy.'" In this case, having a separate intersection table per entity is fine.

But if you do need to search for all entities of all types with a given tag, then using the single "ID" table is easier. Make all the entity tables point to it with a column that you define as both a primary key and a foreign key:

CREATE TABLE Recipes (
  recipe_id INT NOT NULL PRIMARY KEY, -- not auto-generated
  FOREIGN KEY (recipe_id) REFERENCES Taggables(id)
);

The only weakness of this plan is that you can't prevent a row in both Recipes and Ingredients from pointing to the same row in Taggables.

INSERT INTO Taggables (id) VALUES (327);
INSERT INTO Recipes (recipe_id, name) VALUES (327, 'Hollandaise sauce');
INSERT INTO Ingredients (ingr_id, name) VALUES (327, 'eggs');

Do you want every tag associated with eggs to also apply to Hollandaise sauce?

I'm just pointing out this aspect of the single-table design. It may still be the best way to model your tagging, given other requirements. But you should be watchful of the potential for collision of id's in the dependent tables.



回答2:

I don't see anything wrong with having a single table for all tag assignments (as opposed to multiple tables - one for each taggable entity).

However, one important detail in your design remains ambiguous to me: if you are going to have something along these lines

- - - - - - - - - -
Tag
    ID           // PK
    Name
    ...

- - - - - - - - - -
Taggable
    ID           // PK
    ...

- - - - - - - - - -
TagAssignment
    Tag_ID       // FK -> Tag.ID
    Taggable_ID  // FK -> Taggable.ID
    ...

- - - - - - - - - -
EntityOne
    Taggable_ID  // FK -> Taggable.ID
    ...

- - - - - - - - - -
EntityTwo
    Taggable_ID  // FK -> Taggable.ID
    ...

then are your entity classes going to have their own primary keys or are you going to use EntityOne.TaggableID and EntityTwo.TaggableID as de facto primary keys for EntityOne and EntityTwo?

In most general case, I would be cautious and let entities have their own IDs:

- - - - - - - - - -
EntityOne
    ID           // PK
    Taggable_ID  // FK -> Taggable.ID (Nullable)
    ...

- - - - - - - - - -
EntityTwo
    ID           // PK
    Taggable_ID  // FK -> Taggable.ID (Nullable)
    ...

This would not require each entity to have a corresponding instance of Taggable and therefore this would not require every piece of code concerned with an entity to also be aware of tags. However, if tagging is going to be really ubiquitous in the system, and if you are sure that you won't need any other "common ancestors" for entities (that is, other than Taggable), then you might get away without "intrinsic" IDs for entities.

NB: I never tried to implement anything like this, so all my recommendations are purely theoretical. So please do not shoot me if I do not see some obvious flaws. :-)


In response to Bill Karwin's comment:

You are right: the design described above does not prevent multiple entities to refer to same Taggable. But:

  1. Like I said, all depends on requirements. If we are sure that Taggable is going to be the only "common ancestor" of entities, then it is okay to use Taggable_ID FKs as PKs for entities. But, for example, what if some entities that happen to be "taggable" also have to be "watchable" (think notifications, notification schedules, etc.) or "whatever-able" :-)? Can we cut all those "abilities" off by tying any entity hard to Taggable?

  2. If you really want to have DB-level enforcement of one-taggable-one-entity constraint... AFAIK, there is at least one common way to do that without making FKs serve as PKs: by introducing "types" of taggables (which may be useful for some other functionality anyway).

Something along these lines would let us have a cake and eat it:

- - - - - - - - - -
Taggable
    ID           // PK
    Type        
    ... 
    - - - - - - - -
    Constraint: (ID, Type) is unique


- - - - - - - - - -
EntityOne
    ID
    Taggable_ID   
    Taggable_Type // Constraint: always = 'EntityOne'
    ...
    - - - - - - - -
    FK: (Taggable_ID, Taggable_Type) -> (Taggable.ID, Taggable.Type)

Of course, all this is more complicated than just having entities tied to taggables. But I was just trying to discuss what, in my humble opinion, should be considered in addition to the narrow picture provided by the original question.



回答3:

I think you're on the right track. You have described it really good, you have a couple of different entities. You could create a table called entities witch contains all the common attributes (if there is any). So for example

Entity

  • EntityId
  • Name

Ingredient

  • EntityId
  • Amount

RecipeIssuer

  • EntityId
  • SomeOtherInformation

Now you can have a table to tag entities.



回答4:

make tables as normal for recipies, ingredients, etc.

then your tag table should look this like this: Id, Type, Tag

I'd recommend using an enum in code to distinguish the different "Types" (entities).



回答5:

Howabout this?

Types( PK:Type,set_id[,TypeDesc])

Attributes( PK:(set_id,FK:Type),Value)

PS: Bold/Italics Realy Suck



回答6:

I have a similar 'problem' on my hands too. I am developing a small product database which involves tagging and also giving tags a value (tagname: color, value: green for example).

The two main tables are items (I) and articles (A). Items are actual physical items, and articles are derieved from items. Articles are something that can be displayed on a website, and items are the ones to be stored in a warehouse. A small example of this relationship could be car parts. A radiator with known dimensions and other data can actually fit many different models and makes, that's why the item used to respresent the radiator relates to multiple articles which indicate what the radiator can fit. On the other hand we might have two different radiators available for one model, one that is a factory-new version, and one that is just remanufactured. In such a case there are two items relating to the same article.

So, I and A have an N:M relationship.

Items and articles have certain properties. For instance the radiator item might have data like condition, material, weight, height, width and thickness. The article also has some basic information like make, model, year, engine etc, but might also need some special data like chassis model, transmission type, or something else like two different fitting types that have been used on the same model. Because two items can link to one article, it means i cant just tag articles. Tagging an article with both condition values is just stupid, on the other hand tagging one item with multiple instances of a model, make or some special requirement is also not a good idea. _there are two types of properties, the first indicate what something is like, and the second type indicates what it will fit.

Tags do not have to have a value, they can just simply act as a conventional tag that's assigned to an entity.

Radiators are just an example of a simple product. We might aswell put some computer parts or clothing in our database. This means that i need to be able to put different 'tags' on two different entities, I and A.

I need to be able to implement a search for articles in a webshop. Lets say i am using a tree-based navigation where i have a category called "Used nissan radiators". The search would involve searching both the articles and items, articles have the tag Model:Nissan, and items have the tag Condition:Used. Ofcourse when the user looks at the article he will indeed see all the items associated with the article.

One of the solutions i am pondering with is a triangle database design where is a common table called tags for all the properties and tags.

We have the tables items (I), articles (A) and tags (T) They are joined with N:M relationships: I2A joins the items to the articles. T2I joins the tags to the items and might store the value for the tag or property too. T2A joins the tags to the articles and might store a value for the tag too.

On paper, this 6-table design to tackle this problem looks quite nice, but i am getting a headache on forming a decent query where i can select the articles that match a set of different tags and their values, for example: Condition=Remanufactured, Make=Nissan

What i want to be able to do, is something like www.summitracing.com. Select Departments from the left below "Shop", the select any category and you'll see how they have managed to give items some properties. They have engine size for most applications, but when looking for rims, they also have a property for the width.

Any feedback on this would be greatly appreciated, im am starting to hit the dead trying to design this.