Better way than multiple SELECT statements?

2019-08-08 11:17发布

问题:

I'm creating a web app that displays a pie chart. In order to get all the data for the chart from a PostgreSQL 9.3 database in a single HTTP request, I'm combining multiple SELECT statements with UNION ALL — here's a portion:

SELECT 'spf' as type, COUNT(*)
    FROM (SELECT cai.id
          FROM common_activityinstance cai
          JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
          JOIN common_activitysetting cas ON cas.id = cais.id
          JOIN quizzes_quiz q ON q.id = cai.activity_id
          WHERE cai.end_time::date = '2015-09-12'
          AND q.name != 'Exit Ticket Quiz'
          AND cai.activity_type = 'QZ'
          AND (cas.key = 'disable_student_nav' AND cas.value = 'True'
            OR cas.key = 'pacing' AND cas.value = 'student')
          GROUP BY cai.id
          HAVING COUNT(cai.id) = 2) sub
UNION ALL
SELECT 'spn' as type, COUNT(*)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-12'
    AND cai.activity_type = 'QZ'
    AND cas.key = 'disable_student_nav'
    AND cas.value = 'False'
UNION ALL
SELECT 'tp' as type, COUNT(*)
    FROM (SELECT cai.id 
          FROM common_activityinstance cai
          JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
          JOIN common_activitysetting cas ON cas.id = cais.id
          WHERE cai.end_time::date = '2015-09-12'
          AND cai.activity_type = 'QZ'
          AND cas.key = 'pacing' AND cas.value = 'teacher') sub;

This produces a nice, small response for sending back to the client:

 type |  count 
------+---------
 spf  |  100153
 spn  |   96402
 tp   |   84211

I wonder if my queries can be made more efficient. Each SELECT statement uses mostly the same JOIN operations. Is there a way to not repeat the JOIN for each new SELECT?
And I would actually prefer a single row with 3 columns.

Or, in general, is there some entirely different but better approach than what I'm doing?

回答1:

You can bundle most of the cost in a single main query in a CTE and reuse the result several times.
This returns a single row with three columns named after each type (as requested in the comment):

WITH cte AS (
   SELECT cai.id, cai.activity_id, cas.key, cas.value
   FROM   common_activityinstance cai
   JOIN   common_activityinstance_settings s ON s.activityinstance_id = cai.id
   JOIN   common_activitysetting cas ON cas.id = s.id
   WHERE  cai.end_time::date = '2015-09-12'   -- problem?
   AND    cai.activity_type = 'QZ'
   AND   (cas.key = 'disable_student_nav' AND cas.value IN ('True', 'False') OR
          cas.key = 'pacing' AND cas.value IN ('student', 'teacher'))
   )
SELECT *
FROM  (
   SELECT count(*) AS spf
   FROM  (
      SELECT c.id
      FROM   cte c
      JOIN   quizzes_quiz q ON q.id = c.activity_id
      WHERE  q.name <> 'Exit Ticket Quiz'
      AND   (c.key, c.value) IN (('disable_student_nav', 'True')
                               , ('pacing', 'student'))
      GROUP  BY 1
      HAVING count(*) = 2
      ) sub
   ) spf
,  (
   SELECT count(key = 'disable_student_nav' AND value = 'False' OR NULL) AS spn
        , count(key = 'pacing' AND value = 'teacher' OR NULL) AS tp
   FROM   cte
   ) spn_tp;

Should work for Postgres 9.3. In Postgres 9.4 you can use the new aggregate FILTER clause:

  count(*) FILTER (WHERE key = 'disable_student_nav' AND value = 'False') AS spn
, count(*) FILTER (WHERE key = 'pacing' AND value = 'teacher') AS tp

Details for both syntax variants:

  • How can I simplify this game statistics query?

The condition marked problem? may be big performance problem, depending on the data type of cai.end_time. For one, it's not sargable. And if it's a timestamptz type, the expression is hard to index, because the result depends on the current time zone setting of the session - which can also lead to different results when executed in different time zones.

Compare:

  • Sustract two queries from same table
  • Subtract hours from the now() function
  • Ignoring timezones altogether in Rails and PostgreSQL

You just have to name the time zone that is supposed to define your date. Taking my time zone in Vienna as example:

WHERE  cai.end_time >= '2015-09-12 0:0'::timestamp AT TIME ZONE 'Europe/Vienna' 
AND    cai.end_time <  '2015-09-13 0:0'::timestamp AT TIME ZONE 'Europe/Vienna'

You can provide simple timestamptz values as well. You could even just:

WHERE  cai.end_time >= '2015-09-12'::date
AND    cai.end_time <  '2015-09-12'::date + 1

But the first variant does not depend on the current time zone setting.
Detailed explanation in the links above.

Now the query can use your index and should be much faster if there are many different days in your table.



回答2:

This is only a sketch of a completely different approach: construct a boolean "hypercube" for all conditions that you need in your "crosstabulation". the logic of selecting or aggregating subsets can be done later (such as suppressing the exit_tickets, for which the business logic is not clear to me)


SELECT DISTINCT not_exit, disabled, pacing
    , COUNT(*) AS the_count
    FROM (SELECT DISTINCT cai.id
          , EXISTS (SELECT *
            FROM quizzes_quiz q 
            WHERE q.id = cai.activity_id AND q.name != 'Exit Ticket Quiz'
            ) AS not_exit
          , EXISTS ( SELECT *
            FROM common_activityinstance_settings cais  
            JOIN common_activitysetting cas ON cas.id = cais.id
            WHERE cai.id = cais.activityinstance_id
            AND cas.key = 'disable_student_nav' AND cas.value = 'True'
            ) AS disabled
          , EXISTS ( SELECT *
            FROM common_activityinstance_settings cais 
            JOIN common_activitysetting cas ON cas.id = cais.id
            WHERE cai.id = cais.activityinstance_id
            AND cas.key = 'pacing' AND cas.value = 'student')
            ) AS pacing
          FROM common_activityinstance cai
          WHERE cai.end_time::date = '2015-09-12' AND cai.activity_type = 'QZ'
    ) my_cube
GROUP BY 1,2,3
ORDER BY 1,2,3
  ;

Final note: This method is based on my assumption that the underlying data model is in fact an EAV-model, and that an attribute can occur at most once per student.



回答3:

This is a partial answer. The second two can be combined into one query:

SELECT (case when key = 'disable_student_nav' then 'spn' 
             when key = 'pacing' then 'tp'
        end) as type, COUNT(*)
FROM common_activityinstance cai JOIN
     common_activityinstance_settings cais
     ON cai.id = cais.activityinstance_id JOIN
     common_activitysetting cas
     ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-12' AND cai.activity_type = 'QZ' AND
      (key, value) in (('disable_student_nav', 'False'), ('pacing', 'teacher'))
GROUP BY type

I wonder if there is a way to put the first group into similar logic. For instance, if the QZ condition could be applied to all three groups, then adding in the first group would be easy.



回答4:

You can use case with the conditions in your where clause for each type. However, the first query's having condition would not be satisfied by this.

select type, count(*) as count
from
(
SELECT cai.id,
case when q.name!= 'Exit Ticket Quiz' and key = 'disable_student_nav' 
AND value = 'True' OR key = 'pacing' AND value = 'student' then 'spf'
     when key = 'disable_student_nav' AND value = 'False' then 'spn'
     when key = 'pacing' AND value = 'teacher' then 'tp'
 end as type
      FROM common_activityinstance cai
      JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
      JOIN common_activitysetting cas ON cas.id = cais.id
      JOIN quizzes_quiz q ON q.id = cai.activity_id
      WHERE cai.end_time::date = '2015-09-12'
      AND q.name != 'Exit Ticket Quiz'
      AND cai.activity_type = 'QZ'
) t
group by type


回答5:

There's no way to make that query much more efficient, no. You could set up a view or whatever, but it'll always have to run through it three times. But you could solve the problem by doing some post-processing in PHP or PL/SQL or whatever. Start with a simpler query, something like this:

SELECT COUNT(*), cai.id, q.name, key, value FROM common_activityinstance cai JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id JOIN common_activitysetting cas ON cas.id = cais.id WHERE cai.end_time::date = '2015-09-12' GROUP BY cai.id, q.name, key, value

...I'm not clear from your explanation whether this would result in a reasonable number of output rows. But assuming it does, write a little code to massage them into the shape you want.