Many to Many Relation Design - Intersection Table

2019-01-21 08:40发布

问题:

I'm wondering what a better design is for the intersection table for a many-to-many relationship.

The two approaches I am considering are:

CREATE TABLE SomeIntersection 
(
     IntersectionId UNIQUEIDENTIFIER PRIMARY KEY,
     TableAId UNIQUEIDENTIFIER REFERENCES TableA NOT NULL,
     TableBId UNIQUEIDENTIFIER REFERENCES TableB NOT NULL,
     CONSTRAINT IX_Intersection UNIQUE(TableAId, TableBId )
) 

or

CREATE TABLE SomeIntersection 
(
     TableAId UNIQUEIDENTIFIER REFERENCES TableA NOT NULL,
     TableBId UNIQUEIDENTIFIER REFERENCES TableB NOT NULL,
     PRIMARY KEY(TableAId, TableBId )
) 

Are there benefits to one over the other?
EDIT 2:****Please Note: I plan to use Entity Framework to provide an API for the database. With that in mind, does one solution work better with EF than the other?

EDIT: On a related note, for a intersection table that the two columns reference the same table (example below), is there a way to make the two fields differ on a record?

CREATE TABLE SomeIntersection 
(
     ParentRecord INT REFERENCES TableA NOT NULL,
     ChildRecord INT REFERENCES TableA NOT NULL,
     PRIMARY KEY(TableAId, TableBId )
)

I want to prevent the following

ParentRecord          ChildRecord
=================================
      1                    1         --Cyclical reference! 

回答1:

The second version is the best for me, it avoids the creation of an extra index.



回答2:

It's a topic of some debate. I prefer the first form because I consider it better to be able to look up the mapping row by a single value, and like to enforce a perhaps-foolish consistency of having a single-column primary key available in every table without fail. Others think that having that column is a silly waste of space.

We meet for a few rounds of bare-knuckle boxing at a bar in New Jersey every couple of months.



回答3:

Use version-1 if your "intersection" actually IS an entity on its own, meaning:

  • it has additional properties
  • you may search for those objects (and not navigating a relation)

User version-2 if it is purely N-M relation table. In which case also make sure that:

  • you have your PK (CLUSTERED) with the first column relating to the table your search more often: for example if your tables are Person-Address, then I would assume that you would search for all-addresses-of-a-person more often then for all-people-at-this-address. So you should put your PK to include PersonID first
  • you can still have another one-column UNIQUE identifier, but just:

    1. either do not make it a PK
    2. or make it a PK, but specify NON CLUSTERED, so that you can use CLUSTERED for the UNIQUE index covering two referencing columns
    3. unless you use GUIDs by your design, you could then stick to INT IDENTITY column type

In both cases you may want to create another INDEX covering the 2 columns, but in another order, if you search from the other side of the relation often.



回答4:

if you go with the first, just use an IDENTITY on the PK, you don't need to waste the space (disk and memory cache) with a UNIQUEIDENTIFIER .



回答5:

What you are building is called an "Intersection".

I have a very clear memory of my database professor in school saying that an intersection relationship is nearly always an entity in its own right, and so it's normally worth allocating space for it as such. This would indicate that former is more "correct".

That said, I personally tend to prefer the latter. It really comes down to whether you will ever retrieve one of these records directly or if you'll only use the table when joining on one of the original tables.



回答6:

From a developer's perspective view, I prefer the former. It's easier to write and test when dealing with it.

I don't need to check for two keys to retrieve a unique record.



回答7:

If you won't have any additional fields in the intersection table it really doesn't need an ID of its own, and adding one doesn't add any benefit. However, if you're going to be putting other fields into that table, and in many cases you will, it should have its own ID as the primary key.

Rule of thumb, of course, but there you go.



回答8:

Answering your second question...

You just need to add a check constraint, like this:

CREATE TABLE SomeIntersection 
(
     ParentRecord INT REFERENCES TableA NOT NULL,
     ChildRecord INT REFERENCES TableA NOT NULL,
     PRIMARY KEY(TableAId, TableBId),
     CHECK (ParentRecord <> ChildRecord)
)


回答9:

Benefits of the first one:

The ability to look up the join table by a single value. This makes some look up operations simpler on the client side.

Some ORMs (NHibernate) also require ids in each entity for persistence to work properly.

Benefits of the second one:

Simpler data model, with no need to create the additional index.

Less memory required.



回答10:

The second is better. It constrains the junction box (intersection box) to not having more than one appearance of the same pair. If there are no other columns in the junction box, you won't be doing lookups of this table anyway, except by way of the two foreign keys.



回答11:

Given that the TableAId-TableBId combination is unique and that the table is used solely to implemet a many-to-many relationship I would go with the second option. From a purely logical standpoint the fist implemtnation reduces to the second. From a structural perspective your databases needs to maintain both a primary key/index as well as a constraint on the first implementation wereas on the second it only needs to maitain the primary key/index.