query for a set in a relational database

I would like to query a relational database if a set of items exists.

The data I am modeling are of the following form:

key1 = [ item1, item3, item5 ]
key2 = [ item2, item7 ]
key3 = [ item2, item3, item4, item5 ]
...

I am storing them in a table with the following schema

CREATE TABLE sets (key INTEGER, item INTEGER);

So for example, the following insert statements would insert the above three sets.

INSERT INTO sets VALUES ( key1, item1 );
INSERT INTO sets VALUES ( key1, item3 );
INSERT INTO sets VALUES ( key1, item5 );
INSERT INTO sets VALUES ( key2, item2 );
INSERT INTO sets VALUES ( key2, item7 );
INSERT INTO sets VALUES ( key3, item2 );
INSERT INTO sets VALUES ( key3, item3 );
INSERT INTO sets VALUES ( key3, item4 );
INSERT INTO sets VALUES ( key3, item5 );

Given a set of items, I would like the key associated with the set if it is stored in the table and NULL if it is not. Is it possible to do this with an sql query? If so, please provide details.

Details that may be relevant:

I am primarily interested in the database design / query strategy, though I will eventually implement this in MySQL and preform the query from with in python using the mysql-python package.
I have the freedom to restructure the database schema if a different layout would be more convenient for this type of query.
Each set, if it exists is supposed to be unique.
I am not interested in partial matches.
The database scale is on the order of < 1000 sets each of which contains < 10 items each, so performance at this point is not a priority.

Thanks in advance.

标签： mysql database set

4条回答

甜甜的少女心

2楼-- · 2019-02-18 18:16

To simplify collapsar's solution, which was already simplified by Aleksi Torhamo:

It isn't necessary to get all keys that DO NOT MATCH, which could be large, just get the ones that do match and call them partial matches.

-- get all partial matches
CREATE TEMPORARY VIEW partial_matches AS
SELECT DISTINCT key FROM sets WHERE item IN (1,3,5);

-- filter for full matches
SELECT sets.key
FROM  sets, partial_matches
WHERE sets.key = partial_matches.key
GROUP BY sets.key HAVING COUNT(sets.key) = 3;

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

3楼-- · 2019-02-18 18:17

aleksis solution requires an specific query for every posssible item set. the following suggestion provides a generic solution in the sense that the item set to be queried can be factored in as a result set of another query - just replace the set containment operators by a suitable subquery.

     SELECT CASE COUNT(ffffd.key) WHEN 0 THEN NULL ELSE MIN(ffffd.key) END
       FROM (
                 SELECT s4.key
                      , COUNT(*) icount
                   FROM sets s4
                   JOIN (
                          SELECT DISTINCT d.key
                            FROM (
                                   SELECT s1.key
                                     FROM sets s1
                                    WHERE s1.item IN ('item1', 'item3', 'item5')
                                    MINUS
                                   SELECT s2.key
                                     FROM sets s2
                                    WHERE s2.item NOT IN ('item1', 'item3', 'item5')
                                 ) d    
                         ) dd ON ( dd.key = s4.key )
                GROUP BY s4.key
             ) ffffd
       WHERE ffffd.icount = (
                             SELECT COUNT(*)
                               FROM (
                                      SELECT DISTINCT s3.item
                                        FROM sets s3
                                       WHERE s3.item IN ('item1', 'item3', 'item5')
                                    )
                          )
           ;

the result set dd delivers a candidate set of keys who do not asscociate with other items than those from the set to be tested. the only ambiguity may arise from keys who reference a proper subset of the tested item set. thus we count the number of items associated with the keys of dd and choose that key where this number matches the cardinality of the tested item set. if such a key exists it is unique (as we know that the item sets are unique). the case expression in the outermost select is just a fancy way to guarantee that their will be no empty result set, i.e. a null value will be returned if the item set is not represented by the relation.

maybe this solution will be useful to you,

best regards

carsten

0人赞添加讨论(0) 举报

Rolldiameter

4楼-- · 2019-02-18 18:27

This query has a well known name. Google "relational division", "set containment join", "set equality join".

0人赞添加讨论(0) 举报

狗以群分

5楼-- · 2019-02-18 18:28

I won't comment on whether there is a better suited schema for doing this (it's quite possible), but for a schema having columns name and item, the following query should work. (mysql syntax)

SELECT k.name
FROM (SELECT DISTINCT name FROM sets) AS k
INNER JOIN sets i1 ON (k.name = i1.name AND i1.item = 1)
INNER JOIN sets i2 ON (k.name = i2.name AND i2.item = 3)
INNER JOIN sets i3 ON (k.name = i3.name AND i3.item = 5)
LEFT JOIN sets ix ON (k.name = ix.name AND ix.item NOT IN (1, 3, 5))
WHERE ix.name IS NULL;

The idea is that we have all the set keys in k, which we then join with the set item data in sets once for each set item in the set we are searching for, three in this case. Each of the three inner joins with table aliases i1, i2 and i3 filter out all set names that don't contain the item searched for with that join. Finally, we have a left join with sets with table alias ix, which brings in all the extra items in the set, that is, every item we were not searching for. ix.name is NULL in the case that no extra items are found, which is exactly what we want, thus the WHERE clause. The query returns a row containing the set key if the set is found, no rows otherwise.

Edit: The idea behind collapsars answer seems to be much better than mine, so here's a bit shorter version of that with explanation.

SELECT sets.name
FROM sets
LEFT JOIN (
    SELECT DISTINCT name
    FROM sets
    WHERE item NOT IN (1, 3, 5)
) s1
ON (sets.name = s1.name)
WHERE s1.name IS NULL
GROUP BY sets.name
HAVING COUNT(sets.item) = 3;

The idea here is that subquery s1 selects the keys of all sets that contain items other that the ones we are looking for. Thus, when we left join sets with s1, s1.name is NULL when the set only contains items we are searching for. We then group by set key and filter out any sets having the wrong number of items. We are then left with only sets which contain only items we are searching for and are of the correct length. Since sets can only contain an item once, there can only be one set satisfying that criteria, and that's the one we're looking for.

Edit: It just dawned on me how to do this without the exclusion.

SELECT totals.name
FROM (
    SELECT name, COUNT(*) count
    FROM sets
    GROUP BY name
) totals
INNER JOIN (
    SELECT name, COUNT(*) count
    FROM sets
    WHERE item IN (1, 3, 5)
    GROUP BY name
) matches
ON (totals.name = matches.name)
WHERE totals.count = 3 AND matches.count = 3;

The first subquery finds the total count of items in each set and the second one finds out the count of matching items in each set. When matches.count is 3, the set has all the items we're looking for, and if totals.count is also 3, the set doesn't have any extra items.

0人赞添加讨论(0) 举报

query for a set in a relational database

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间