Best schema design for table relationship that enf

2019-04-08 01:03发布

问题:

Given a table of models 'A', that can have multiple child models 'B', of which 'B' will have one or more child models 'C'.. this sounds simple, however I need to enforce that for each 'A', any 'B' must have a unique collection of 'C'.. for example, C cannot be a child of two 'B's that are part of the same parent 'A'.. but a 'C' can be a child of multiple 'B's given that each 'B's parent 'A' is distinct..

Does that make sense or should I unobfuscate my scenario? cheers in advance!

Note that I know this policy will be enforced in the application, but I wan't it to be impossible for the database to be in an invalid state.

Edit: hi all, fantastic feedback so firstly I must thank you all for sharing your knowledge with me.

Just to clarify the situation, I'll explain the scenario but here are some notes:

'A' has zero or more 'B', a 'B' is implicitly associated with an 'A', and as such is always a child of just one 'A'. 'C' is somewhat of a root entity that is associated with many 'B's as well as other elements in the database.


Heres the real story:

This is a website that contains many briefs (A), and many members (C), a brief can have many submissions (B), of which a submission will always have one or more associated members. The idea is that a submission can in fact be a collaboration, and each member has no more 'power' than any other, but there will be a different system in place to validate the policy of how members work together.

So, per brief, a member can only submit a single submission and a submission can have many members (collaborators).

Hope that helps, but I think you've given me plenty of help already!

Steve.

回答1:

I think that you need SQL standard assertions, which are (unfortunately) largely unimplemented by actual DBMS.

All the answers are agreeing that there are three primary tables called TableA, TableB, and TableC, each containing its own ID column:

TableA (A_ID PRIMARY KEY, ...)
TableB (B_ID PRIMARY KEY, ...)
TableC (C_ID PRIMARY KEY, ...)

It is not clear from the description of the problem whether a single B value can have multiple A parent entries. It is clear that a single C can have multiple B parent entries. If a B is tied to a single A, the design of TableB can be revised to:

TableB (B_ID, ..., A_ID REFERENCES TableA)

If a B can be associated with several different A's, then the connection is best represented by a joining table:

A_and_B (A_ID REFERENCES TableA,
         B_ID REFERENCES TableB,
         PRIMARY KEY (A_ID, B_ID)
        )

It is also not clear from the description whether the C's associated with a B must be the same for every A that the B is associated with, or whether different A's can reference the same B, and the set of C's associated with the B for A1 can be different from the set of C's associated with the B for A2. (Of course, if a single B can only be associated with one A, this issue is moot.)

For the purposes of this answer, I'm going to assume that any B is associated with a single A, so the structure of TableB includes A_ID as a foreign key. Since a single C can be associated with multiple B's, the relevant structure is a new joining table:

B_and_C (B_ID REFERENCES TableB,
         C_ID REFERENCES TableC,
         PRIMARY KEY (B_ID, C_ID)
        )

Simplifying (by omitting rules about deferrability and immediacy) an assertion looks like:

CREATE ASSERTION assertion_name CHECK ( <search_condition> )

So, once we have a set of design decisions, we can write an assertion to validate the data. Given tables TableA, TableB (with foreign key A_ID), TableC and B_and_C, the requirement is that the number of occurrences of a given C_ID across a complete A is 1.

CREATE ASSERTION only_one_instance_of_c_per_a CHECK
(
     NOT EXISTS (
         SELECT A_ID, COUNT(C_ID)
             FROM TableB JOIN B_and_C USING (C_ID)
             GROUP BY A_ID
             HAVING COUNT(C_ID) > 1
     )
)

[Amended: I think this is more accurate:

CREATE ASSERTION only_one_instance_of_c_per_a CHECK
(
     NOT EXISTS (
         SELECT A_ID, C_ID, COUNT(*)
             FROM TableB JOIN B_and_C USING (C_ID)
             GROUP BY A_ID, C_ID
             HAVING COUNT(*) > 1
     )
)

]

The set of join conditions varies with other rules for how the tables are connected, but the overall constraint structure remains the same - there must not exist more than one reference to a given C_ID for a particular A_ID.


In the comments below, meandmycode notes:

I get the feeling that there's a flaw in my design. My real world logic is that a 'B' always has at least one child 'C'. This doesn't make sense given that 'B' must exist before its child can be attached. The database would currently allow a 'B' to be attached to an 'A' without having at least ONE 'C'.. child, I'm as such going to revise 'B' so that it has a field that refers to its primary child 'C', as well as having a child collection of additional 'C's, but now I have a collection that could also include the primary 'C' specified by the 'B', which would be.. wrong.

Is there a db pattern that would infer a 'one or more children' rule, vs zero or more?

I think you do have problems with your model. It is hard to create a B if there must already exist a C that refers to the newly created B, especially if C's must only refer to existing B's. The phrase 'chicken and egg' comes to mind. So, normally, you allow B's to have zero or more C's in a context like this.

You've still not stipulated whether TableB has an A_ID foreign key or whether you have a linking table like A_and_B. If it has a foreign key, then presumably you cannot create a B until you've created the A to which it refers.

I don't think including one C ID in table B is a good idea - it makes for asymmetric processing (harder SQL). It also means that if you need to delete that one C, you have to update things so that one of the other C references is deleted from the table it is currently in, and then update the value in the B record. That's messy, to be polite about it.

I think you need to amend your question to define the actual table structure you're looking at - along the lines shown in various answers; you can use triple dots to represent other but irrelevant columns. The assertion I suggested would probably have to be implemented as some sort of trigger - which gets into DBMS-specific notations.


From the amended description of briefs (A), submissions (B) and members (C), it is clear that a single submission applies to just one brief, so that submissions can have a simple foreign key that identifies the brief it is a submission for. And a member can only collaborate on one submission for a particular brief. There will be a table of 'submission_collaborators' with columns to identify the submission and member, the combination is the primary key and each column is a foreign key.

Briefs(Brief_ID, ...)
Submissions(Submission_ID, Brief_ID REFERENCES Briefs, ...)
Members(Member_ID, ...)
Submission_Collaborators(Submission_ID REFERENCES Submissions,
                         Member_ID REFERENCES Members,
                         PRIMARY KEY (Submission_ID, Member_ID)
                        )

Hence, the requirement is that the following query must return no rows:

SELECT s.brief_id, c.member_id, COUNT(*)
    FROM submissions AS s JOIN submission_collaborators AS c
         ON s.submission_id = c.submission_id
    GROUP BY s.brief_id, c.member_id
    HAVING COUNT(*) > 1

This is the same query that I embedded in the CREATE ASSERTION (second variant). You can dig out extra information (brief title, submission title, member name, various dates, etc) as well, but the core of the issue is that the query shown must return no data.



回答2:

I think I've got your relationship model captured here; If not then I vote for unobfuscation:

  • A [ {AID}, ... ]
  • B [ {BID}, AID, ... ]
  • C [ {CID}, ... ]
  • B_C_Link [ {BID, CID}, AID ]
    • Additional Unique Index on (AID, CID)

The notation uses the {} primary key indicator. So As can have multiple Bs (by putting an AID on B), Bs can have Cs (by using a many-to-many table B_C_Link), and multiple Cs cannot belong to the same A (by adding the AID to the many-to-many table and enforcing (AID, CID) uniqueness.



回答3:

What you have is ternary relationship. What you need to do is have a table that ties A and B and C together in it's primary key. Since primary keys cannot be duplicated, that will enforce there to be only one C for each A, and also each B. This will create the unique collection you were looking for.

You get the following table structure:

A's({A_ID}, ...)
B's({B_ID}, ...)
C's({C_ID}, ...)
A_B_C_Relation({[A_ID], [B_ID], [C_ID]}, ...)

Primary keys are in the braces, foreign keys are in brackets.

Look here for more info.



回答4:

Add TableA's ID to TableB, and add it to the primary key, and do the same thing for TableB and TableC.

edit:

I believe the first part of this answer will work for the A to B constraint. However, I would then put a linking table between B and C which also held A's PK. that way you've got a 1:N between A:B, and your constraints are then enforced.



回答5:

I don't think you'll be able to do this with simple declarative referential integrity constraints. The best way to enforce the logic might be to use triggers to implement the business constraints, and rollback any insert or update which violates the rules.