有效的查询合并2级以上的子查询(Effective query merging more than

2019-09-16 16:31发布

我有一个数据库

books          (primary key: bookID)
characterNames (foreign key: books.bookID) 
locations      (foreign key: books.bookID)

该在文本位置字符名称和位置都保存在相应的表所示。
我使用的psycopg2写Pythonscript,发现在书籍给定字符的名称和位置出现的所有。 我只希望在图书,其中两个角色名称和位置找到的出现次数。
在这里我已经有了一个解决方案,寻找一个位置,一个字:

WITH b AS (  
    SELECT bookid  
    FROM   characternames  
    WHERE  name = 'XXX'  
    GROUP  BY 1  
    INTERSECT  
    SELECT bookid  
    FROM   locations  
    WHERE  l.locname = 'YYY'  
    GROUP  BY 1  
    )  
SELECT bookid, position, 'char' AS what  
FROM   b  
JOIN   characternames USING (bookid)  
WHERE  name = 'XXX'  
UNION  ALL  
SELECT bookid, position, 'loc' AS what  
FROM   b  
JOIN   locations USING (bookid)  
WHERE  locname = 'YYY'  
ORDER  BY bookid, position;  

CTE的“B”包含了所有BOOKID的,其中人物的名字“XXX”和位置“YYY”出现。

现在我还知道关于搜索(分别或2名和地点)2位和一个名字。 这很简单,如果所有搜索实体必须在一本书,但对于发生这种情况:
搜索:蒂姆,铝,的Toolshop结果:书籍,其中包括
(添,铝,的Toolshop)或
(添,Al)或
(添的Toolshop)或
的(Al,的Toolshop)

这个问题可以重复4,5,6 ...条件。
我thougt约相交个子查询,但是那是行不通的。
相反,我会联盟发现bookIDs,GROUP它们,然后选择BOOKID的发生,曾多次:

WITH b AS (  
    SELECT bookid, count(bookid) AS occurrences  
    FROM  
        (SELECT DISTINCT bookid  
        FROM characterNames  
        WHERE name='XXX'  
        UNION  
        SELECT DISTINCT bookid  
        FROM characterNames  
        WHERE name='YYY'  
        UNION  
        SELECT DISTINCT bookid  
        FROM locations  
        WHERE locname='ZZZ'  
        GROUP BY bookid)  
    WHERE occurrences>1)  

我想这样的作品,此刻不能测试它,但它做到这一点的最好方法是什么?

Answer 1:

使用计数广义情况下的想法是合理的。 一对夫妇的调整语法,虽然的:

WITH b AS (  
   SELECT bookid
   FROM  (
      SELECT DISTINCT bookid  
      FROM   characterNames  
      WHERE  name='XXX'  

      UNION ALL  
      SELECT DISTINCT bookid  
      FROM   characterNames  
      WHERE  name='YYY'  

      UNION ALL
      SELECT DISTINCT bookid  
      FROM   locations  
      WHERE  locname='ZZZ'  
      ) x
   GROUP  BY bookid
   HAVING count(*) > 1
   )
SELECT bookid, position, 'char' AS what
FROM   b
JOIN   characternames USING (bookid)
WHERE  name = 'XXX'

UNION  ALL
SELECT bookid, position, 'loc' AS what
FROM   b
JOIN   locations USING (bookid)
WHERE  locname = 'YYY'
ORDER  BY bookid, position;

笔记

  • 使用UNION ALL (不UNION )保存在子查询之间的重复。 你希望他们在这种情况下,能够指望他们。

  • 子查询都应该产生不同的值。 它的工作原理与DISTINCT你有它的方式。 你可能想尝试GROUP BY 1代替,看看是否能更好地执行(我不指望它。)

  • GROUP BY HAST去子查询之外。 这将仅适用于最后子查询是没有意义的存在,你有DISTINCT bookid了。

  • 检查是否有一本书一个以上的点击率已经进入一个HAVING子句:

      HAVING count(*) > 1 

    你不能在使用聚合值WHERE子句。


在一个表上组合的条件

在一个表上,你不能简单地合并多个条件。 你将如何计算结果的数量? 但是有一个稍微更复杂的方式。 可能会或可能不会提高性能,你必须测试(用EXPLAIN ANALYZE )。 这两个查询都需要对表至少两个索引扫描characterNames 。 至少它缩短了语法。

你看我怎样计算命中的数量characterNames ,我怎么改成sum(hits)SELECT

WITH b AS (  
   SELECT bookid
   FROM  (
      SELECT bookid
           , max((name='XXX')::int)
           + max((name='YYY')::int) AS hits
      FROM   characterNames  
      WHERE  (name='XXX'
           OR name='YYY')
      GROUP  BY bookid

      UNION ALL
      SELECT DISTINCT bookid, 1 AS hits  
      FROM   locations  
      WHERE  locname='ZZZ'  
      ) x
   GROUP  BY bookid
   HAVING sum(hits) > 1
   )
...

一个转换booleaninteger给出0FALSE1TRUE 。 这有助于。


更快地EXISTS

虽然骑着自行车到我公司这件事保持在我的后脑勺踢。 我有理由相信该查询可能会更快。 请试一试:

WITH b AS (  
   SELECT bookid

        , (EXISTS (
            SELECT *
            FROM   characterNames c
            WHERE  c.bookid = b.bookid
            AND    c.name = 'XXX'))::int
        + (EXISTS (
            SELECT *
            FROM   characterNames c
            WHERE  c.bookid = b.bookid
            AND    c.name = 'YYY'))::int AS c_hits

        , (EXISTS (
            SELECT *
            FROM   locations l
            WHERE  l.bookid = b.bookid
            AND    l.locname='ZZZ'))::int AS l_hits
   FROM   books b  
   WHERE  (c_hits + l_hits) > 1
   )
SELECT c.bookid, c.position, 'char' AS what
FROM   b
JOIN   characternames c USING (bookid)
WHERE  b.c_hits > 0
AND    c.name IN ('XXX', 'YYY')

UNION  ALL
SELECT l.bookid, l.position, 'loc' AS what
FROM   b
JOIN   locations l USING (bookid)
WHERE  b.l_hits > 0
AND    l.locname = 'YYY'
ORDER  BY 1,2,3;
  • EXISTS半连接可以停在第一场比赛执行。 因为我们只在CTE全有或全无的答案感兴趣,这可能可能做的工作快得多。

  • 这样,我们也不需要聚合(没有GROUP BY必要)。

  • 我还记得任何字符或场所是否发现,只有重新审视与实际匹配表。



文章来源: Effective query merging more than 2 subqueries