if a non-correlated subquery is repeated at severa

2020-07-14 09:23发布

问题:

If I have a query like

SELECT date_trunc('day', assigndate)e,
       count(CASE WHEN a.assigneeid = 65548
             AND a.assigneeid IN
               (SELECT userid
                FROM groupmembers
                WHERE groupid = 65553) THEN 1 ELSE NULL END) assigned,
       count(CASE WHEN a.assigneeid = 65548
             AND a.completedtime IS NOT NULL
             AND a.assigneeid IN
               (SELECT userid
                FROM groupmembers
                WHERE groupid = 65553) THEN 1 ELSE NULL END) completed
FROM ASSIGNMENT a
WHERE assigndate > CURRENT_TIMESTAMP - interval '20 days'
GROUP BY date_trunc('day',assigndate);

The subquery in question is

SELECT userid
                FROM groupmembers
                WHERE groupid = 65553

then since the subquery is not co-related to the parent query, it will be executed just once and the cached result will be used. But since the subquery is present at 2 locations in the query, then according to the SQL plan, it is evaluated twice. Is there any way to cache the result of that subquery and use it at both the locations ?

The subquery can't be converted to a join as is no single field on which to join (and it can't be an unconditional join, as the count will become wrong then)

回答1:

You can use a common table express (WITH)

with cte as 
(
     SELECT userid FROM groupmembers WHERE groupid = 65553
)
SELECT 
    date_trunc('day', assigndate)e,  
    count(CASE WHEN a.assigneeid = 65548 AND a.assigneeid IN  
           (SELECT userid from cte) then 1 else null end) assigned,
...


回答2:

You should rewrite the query to eliminate the subqueries:

SELECT date_trunc('day', assigndate)e,
       sum(CASE WHEN a.assigneeid = 65548 and gm.userid is not null then 1 else 0
           end) as assigned,
       sum(CASE WHEN a.assigneeid = 65548 and a.completedtime IS NOT NULL and gm.userid is not null
                then 1 else 0
           end) as completed
FROM ASSIGNMENT a left outer join
     (select distinct userid
      from groupmembers
      where groupid = 65553
     ) gm
     on a.assigneeid = gm.userid
WHERE assigndate > CURRENT_TIMESTAMP - interval '20 days'
GROUP BY date_trunc('day',assigndate)
order by 1

In general, I think it is good practice to keep table references in the FROM (or WITH) clauses. It can be hard to follow the logic of subqueries in the SELECT clause. In this case, the subqueries are so somilar that they are practically begging to be combined into a single statement.