What is the best way to represent a many-to-many r

2019-03-28 12:50发布

站内文章 / 移动开发

51 0

啃猪蹄的小仙女

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a SQL table like so:

Update: I'm changing the example table as the existing hierarchical nature of the original data (State, Cities, Schools) is overshadowing the fact that a simple relationship is needed between the items.

entities
id      name               
1       Apple     
2       Orange            
3       Banana             
4       Carrot                
5       Mushroom

I want to define two-way relationships between these entities so a user viewing one entity can see a list of all related entities.

The relationships are defined by an end user.

What is the best way to represent these relationships in the database and subsequently query and update them?

One way as I see it...

My instinct says a relationship table like so:

entity_entity
entity_id_a       entity_id_b
1                 2
5                 1
4                 1
5                 4
1                 3

That being the case, given a supplied entity_id of 4, how would one get all related records, which would be 1 and 5?

Likewise a query of entity_id = 1 should return 2, 3, 4, and 5.

Thanks for your time and let me know if I can clarify the question at all.

回答1:

Define a constraint: entity_id_a < entity_id_b.

Create indexes:

CREATE UNIQUE INDEX ix_a_b ON entity_entity(entity_id_a, entity_id_b);
CREATE INDEX ix_b ON entity_entity(entity_id_b);

Second index doesn't need to include entity_id_a as you will use it only to select all a's within one b. RANGE SCAN on ix_b will be faster than a SKIP SCAN on ix_a_b.

Populate the table with your entities as follows:

INSERT
INTO entity_entity (entity_id_a, entity_id_b)
VALUES (LEAST(@id1, @id2), GREATEST(@id1, @id2))

Then select:

SELECT entity_id_b
FROM entity_entity
WHERE entity_id_a = @id
UNION ALL
SELECT entity_id_a
FROM entity_entity
WHERE entity_id_b = @id

UNION ALL here lets you use above indexes and avoid extra sorting for uniqueness.

All above is valid for a symmetric and anti-reflexive relationship. That means that:

If a is related to b, then b is related to a
a is never related to a

回答2:

I think the structure you have suggested is fine.

To get the related records do something like

SELECT related.* FROM entities AS search 
LEFT JOIN entity_entity map ON map.entity_id_a = search.id
LEFT JOIN entities AS related ON map.entity_id_b = related.id
WHERE search.name = 'Search term'

Hope that helps.

回答3:

The link table approach seems fine, except that you might want a 'relationship type' so that you know WHY they are related.

For example, the relation between Raleigh and North Carolina is not the same as a relation between Raleigh and Durham. Additionally, you may want to know who is the 'parent' in the relationship, in case you were driving conditional drop-downs. (i.e. You select a State, you get to see the cities that are in the state).

Depending on the complexity of your requirements, the simple setup you have right now may not be sufficient. If you simply need to show that two records are related in some way, the link table should be sufficient.

回答4:

I already posted a way to do it in your design, but I also wanted to offer this separate design insight if you have some flexibility in your design and this more closely fits your needs.

If the items are in (non-overlapping) equivalence classes, you might want to make equivalence classes the basis for the table design, where everything in class is considered equivalent. The classes themselves can be anonymous:

CREATE TABLE equivalence_class (
    class_id int -- surrogate, IDENTITY, autonumber, etc.
    ,entity_id int
)

entity_id should be unique for a non-overlapping partition of your space.

This avoids the problem of ensuring proper left- or right-handed-ness or forcing an upper-right relationship matrix.

Then your query is a little different:

SELECT c2.entity_id
FROM equivalence_class c1
INNER JOIN equivalence_class c2
    ON c1.entity_id = @entity_id
    AND c1.class_id = c2.class_id
    AND c2.entity_id <> @entity_id

or, equivalently:

SELECT c2.entity_id
FROM equivalence_class c1
INNER JOIN equivalence_class c2
    ON c1.entity_id = @entity_id
    AND c1.class_id = c2.class_id
    AND c2.entity_id <> c1.entity_id

回答5:

I can think of a few ways.

A single pass with a CASE:

SELECT DISTINCT
    CASE
        WHEN entity_id_a <> @entity_id THEN entity_id_a
        WHEN entity_id_b <> @entity_id THEN entity_id_b
    END AS equivalent_entity
FROM entity_entity
WHERE entity_id_a = @entity_id OR entity_id_b = @entity_id

Or two filtered queries UNIONed thus:

SELECT entity_id_b AS equivalent_entity
FROM entity_entity
WHERE entity_id_a = @entity_id
UNION
SELECT entity_id_a AS equivalent_entity
FROM entity_entity
WHERE entity_id_b = @entity_id

回答6:

select * from entities
where entity_id in 
(
    select entity_id_b 
    from entity_entity 
    where entity_id_a = @lookup_value
)

回答7:

Based on your updated schema this query should work:

select if(entity_id_a=:entity_id,entity_id_b,entity_id_a) as related_entity_id where :entity_id in (entity_id_a, entity_id_b)

where :entity_id is bound to the entity you are querying

回答8:

My advice is that your intial table design is bad. Do not store different types of things in the same table. (First rule of database design, right up there with do not store multiple pieces of information in the same field). This is much harder to query and will cause significant performance problems down the road. Plus it would be a problem entering the data into the realtionship table - how do you know what entities would need to be realted when you do a new entry? It would be much better to design properly relational tables. Entity tables are almost always a bad idea. I see no reason at all from the example to have this type of information in one table. Frankly I'd have a university table and a related address table. It would easy to query and perform far better.