我有一个数据库
books (primary key: bookID)
characterNames (foreign key: books.bookID)
locations (foreign key: books.bookID)
该在文本位置字符名称和位置都保存在相应的表所示。
我使用的psycopg2写Pythonscript,发现在书籍给定字符的名称和位置出现的所有。 我只希望在图书,其中两个角色名称和位置找到的出现次数。
在这里我已经有了一个解决方案,寻找一个位置,一个字:
WITH b AS (
SELECT bookid
FROM characternames
WHERE name = 'XXX'
GROUP BY 1
INTERSECT
SELECT bookid
FROM locations
WHERE l.locname = 'YYY'
GROUP BY 1
)
SELECT bookid, position, 'char' AS what
FROM b
JOIN characternames USING (bookid)
WHERE name = 'XXX'
UNION ALL
SELECT bookid, position, 'loc' AS what
FROM b
JOIN locations USING (bookid)
WHERE locname = 'YYY'
ORDER BY bookid, position;
CTE的“B”包含了所有BOOKID的,其中人物的名字“XXX”和位置“YYY”出现。
现在我还知道关于搜索(分别或2名和地点)2位和一个名字。 这很简单,如果所有搜索实体必须在一本书,但对于发生这种情况:
搜索:蒂姆,铝,的Toolshop结果:书籍,其中包括
(添,铝,的Toolshop)或
(添,Al)或
(添的Toolshop)或
的(Al,的Toolshop)
这个问题可以重复4,5,6 ...条件。
我thougt约相交个子查询,但是那是行不通的。
相反,我会联盟发现bookIDs,GROUP它们,然后选择BOOKID的发生,曾多次:
WITH b AS (
SELECT bookid, count(bookid) AS occurrences
FROM
(SELECT DISTINCT bookid
FROM characterNames
WHERE name='XXX'
UNION
SELECT DISTINCT bookid
FROM characterNames
WHERE name='YYY'
UNION
SELECT DISTINCT bookid
FROM locations
WHERE locname='ZZZ'
GROUP BY bookid)
WHERE occurrences>1)
我想这样的作品,此刻不能测试它,但它做到这一点的最好方法是什么?
使用计数广义情况下的想法是合理的。 一对夫妇的调整语法,虽然的:
WITH b AS (
SELECT bookid
FROM (
SELECT DISTINCT bookid
FROM characterNames
WHERE name='XXX'
UNION ALL
SELECT DISTINCT bookid
FROM characterNames
WHERE name='YYY'
UNION ALL
SELECT DISTINCT bookid
FROM locations
WHERE locname='ZZZ'
) x
GROUP BY bookid
HAVING count(*) > 1
)
SELECT bookid, position, 'char' AS what
FROM b
JOIN characternames USING (bookid)
WHERE name = 'XXX'
UNION ALL
SELECT bookid, position, 'loc' AS what
FROM b
JOIN locations USING (bookid)
WHERE locname = 'YYY'
ORDER BY bookid, position;
笔记
使用UNION ALL
(不UNION
)保存在子查询之间的重复。 你希望他们在这种情况下,能够指望他们。
子查询都应该产生不同的值。 它的工作原理与DISTINCT
你有它的方式。 你可能想尝试GROUP BY 1
代替,看看是否能更好地执行(我不指望它。)
在GROUP BY
HAST去子查询之外。 这将仅适用于最后子查询是没有意义的存在,你有DISTINCT bookid
了。
检查是否有一本书一个以上的点击率已经进入一个HAVING
子句:
HAVING count(*) > 1
你不能在使用聚合值WHERE
子句。
在一个表上组合的条件
在一个表上,你不能简单地合并多个条件。 你将如何计算结果的数量? 但是有一个稍微更复杂的方式。 可能会或可能不会提高性能,你必须测试(用EXPLAIN ANALYZE
)。 这两个查询都需要对表至少两个索引扫描characterNames
。 至少它缩短了语法。
你看我怎样计算命中的数量characterNames
,我怎么改成sum(hits)
外SELECT
:
WITH b AS (
SELECT bookid
FROM (
SELECT bookid
, max((name='XXX')::int)
+ max((name='YYY')::int) AS hits
FROM characterNames
WHERE (name='XXX'
OR name='YYY')
GROUP BY bookid
UNION ALL
SELECT DISTINCT bookid, 1 AS hits
FROM locations
WHERE locname='ZZZ'
) x
GROUP BY bookid
HAVING sum(hits) > 1
)
...
一个转换boolean
到integer
给出0
为FALSE
和1
为TRUE
。 这有助于。
更快地EXISTS
虽然骑着自行车到我公司这件事保持在我的后脑勺踢。 我有理由相信该查询可能会更快。 请试一试:
WITH b AS (
SELECT bookid
, (EXISTS (
SELECT *
FROM characterNames c
WHERE c.bookid = b.bookid
AND c.name = 'XXX'))::int
+ (EXISTS (
SELECT *
FROM characterNames c
WHERE c.bookid = b.bookid
AND c.name = 'YYY'))::int AS c_hits
, (EXISTS (
SELECT *
FROM locations l
WHERE l.bookid = b.bookid
AND l.locname='ZZZ'))::int AS l_hits
FROM books b
WHERE (c_hits + l_hits) > 1
)
SELECT c.bookid, c.position, 'char' AS what
FROM b
JOIN characternames c USING (bookid)
WHERE b.c_hits > 0
AND c.name IN ('XXX', 'YYY')
UNION ALL
SELECT l.bookid, l.position, 'loc' AS what
FROM b
JOIN locations l USING (bookid)
WHERE b.l_hits > 0
AND l.locname = 'YYY'
ORDER BY 1,2,3;
该EXISTS
半连接可以停在第一场比赛执行。 因为我们只在CTE全有或全无的答案感兴趣,这可能可能做的工作要快得多。
这样,我们也不需要聚合(没有GROUP BY
必要)。
我还记得任何字符或场所是否发现,只有重新审视与实际匹配表。