可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm creating a web app that displays a pie chart. In order to get all the data for the chart from a PostgreSQL 9.3 database in a single HTTP request, I'm combining multiple SELECT
statements with UNION ALL
— here's a portion:
SELECT 'spf' as type, COUNT(*)
FROM (SELECT cai.id
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
JOIN quizzes_quiz q ON q.id = cai.activity_id
WHERE cai.end_time::date = '2015-09-12'
AND q.name != 'Exit Ticket Quiz'
AND cai.activity_type = 'QZ'
AND (cas.key = 'disable_student_nav' AND cas.value = 'True'
OR cas.key = 'pacing' AND cas.value = 'student')
GROUP BY cai.id
HAVING COUNT(cai.id) = 2) sub
UNION ALL
SELECT 'spn' as type, COUNT(*)
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-12'
AND cai.activity_type = 'QZ'
AND cas.key = 'disable_student_nav'
AND cas.value = 'False'
UNION ALL
SELECT 'tp' as type, COUNT(*)
FROM (SELECT cai.id
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-12'
AND cai.activity_type = 'QZ'
AND cas.key = 'pacing' AND cas.value = 'teacher') sub;
This produces a nice, small response for sending back to the client:
type | count
------+---------
spf | 100153
spn | 96402
tp | 84211
I wonder if my queries can be made more efficient. Each SELECT statement uses mostly the same JOIN operations. Is there a way to not repeat the JOIN for each new SELECT?
And I would actually prefer a single row with 3 columns.
Or, in general, is there some entirely different but better approach than what I'm doing?
回答1:
You can bundle most of the cost in a single main query in a CTE and reuse the result several times.
This returns a single row with three columns named after each type
(as requested in the comment):
WITH cte AS (
SELECT cai.id, cai.activity_id, cas.key, cas.value
FROM common_activityinstance cai
JOIN common_activityinstance_settings s ON s.activityinstance_id = cai.id
JOIN common_activitysetting cas ON cas.id = s.id
WHERE cai.end_time::date = '2015-09-12' -- problem?
AND cai.activity_type = 'QZ'
AND (cas.key = 'disable_student_nav' AND cas.value IN ('True', 'False') OR
cas.key = 'pacing' AND cas.value IN ('student', 'teacher'))
)
SELECT *
FROM (
SELECT count(*) AS spf
FROM (
SELECT c.id
FROM cte c
JOIN quizzes_quiz q ON q.id = c.activity_id
WHERE q.name <> 'Exit Ticket Quiz'
AND (c.key, c.value) IN (('disable_student_nav', 'True')
, ('pacing', 'student'))
GROUP BY 1
HAVING count(*) = 2
) sub
) spf
, (
SELECT count(key = 'disable_student_nav' AND value = 'False' OR NULL) AS spn
, count(key = 'pacing' AND value = 'teacher' OR NULL) AS tp
FROM cte
) spn_tp;
Should work for Postgres 9.3. In Postgres 9.4 you can use the new aggregate FILTER
clause:
count(*) FILTER (WHERE key = 'disable_student_nav' AND value = 'False') AS spn
, count(*) FILTER (WHERE key = 'pacing' AND value = 'teacher') AS tp
Details for both syntax variants:
- How can I simplify this game statistics query?
The condition marked problem?
may be big performance problem, depending on the data type of cai.end_time
. For one, it's not sargable. And if it's a timestamptz
type, the expression is hard to index, because the result depends on the current time zone setting of the session - which can also lead to different results when executed in different time zones.
Compare:
- Sustract two queries from same table
- Subtract hours from the now() function
- Ignoring timezones altogether in Rails and PostgreSQL
You just have to name the time zone that is supposed to define your date. Taking my time zone in Vienna as example:
WHERE cai.end_time >= '2015-09-12 0:0'::timestamp AT TIME ZONE 'Europe/Vienna'
AND cai.end_time < '2015-09-13 0:0'::timestamp AT TIME ZONE 'Europe/Vienna'
You can provide simple timestamptz
values as well. You could even just:
WHERE cai.end_time >= '2015-09-12'::date
AND cai.end_time < '2015-09-12'::date + 1
But the first variant does not depend on the current time zone setting.
Detailed explanation in the links above.
Now the query can use your index and should be much faster if there are many different days in your table.
回答2:
This is only a sketch of a completely different approach: construct a boolean "hypercube" for all conditions that you need
in your "crosstabulation". the logic of selecting or aggregating subsets can be done later (such as suppressing the exit_tickets, for which the business logic is not clear to me)
SELECT DISTINCT not_exit, disabled, pacing
, COUNT(*) AS the_count
FROM (SELECT DISTINCT cai.id
, EXISTS (SELECT *
FROM quizzes_quiz q
WHERE q.id = cai.activity_id AND q.name != 'Exit Ticket Quiz'
) AS not_exit
, EXISTS ( SELECT *
FROM common_activityinstance_settings cais
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.id = cais.activityinstance_id
AND cas.key = 'disable_student_nav' AND cas.value = 'True'
) AS disabled
, EXISTS ( SELECT *
FROM common_activityinstance_settings cais
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.id = cais.activityinstance_id
AND cas.key = 'pacing' AND cas.value = 'student')
) AS pacing
FROM common_activityinstance cai
WHERE cai.end_time::date = '2015-09-12' AND cai.activity_type = 'QZ'
) my_cube
GROUP BY 1,2,3
ORDER BY 1,2,3
;
Final note: This method is based on my assumption that the underlying data model is in fact an EAV-model, and that an attribute can occur at most once per student.
回答3:
This is a partial answer. The second two can be combined into one query:
SELECT (case when key = 'disable_student_nav' then 'spn'
when key = 'pacing' then 'tp'
end) as type, COUNT(*)
FROM common_activityinstance cai JOIN
common_activityinstance_settings cais
ON cai.id = cais.activityinstance_id JOIN
common_activitysetting cas
ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-12' AND cai.activity_type = 'QZ' AND
(key, value) in (('disable_student_nav', 'False'), ('pacing', 'teacher'))
GROUP BY type
I wonder if there is a way to put the first group into similar logic. For instance, if the QZ
condition could be applied to all three groups, then adding in the first group would be easy.
回答4:
You can use case
with the conditions in your where
clause for each type. However, the first query's having
condition would not be satisfied by this.
select type, count(*) as count
from
(
SELECT cai.id,
case when q.name!= 'Exit Ticket Quiz' and key = 'disable_student_nav'
AND value = 'True' OR key = 'pacing' AND value = 'student' then 'spf'
when key = 'disable_student_nav' AND value = 'False' then 'spn'
when key = 'pacing' AND value = 'teacher' then 'tp'
end as type
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
JOIN quizzes_quiz q ON q.id = cai.activity_id
WHERE cai.end_time::date = '2015-09-12'
AND q.name != 'Exit Ticket Quiz'
AND cai.activity_type = 'QZ'
) t
group by type
回答5:
There's no way to make that query much more efficient, no. You could set up a view or whatever, but it'll always have to run through it three times. But you could solve the problem by doing some post-processing in PHP or PL/SQL or whatever. Start with a simpler query, something like this:
SELECT COUNT(*), cai.id, q.name, key, value
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-12'
GROUP BY cai.id, q.name, key, value
...I'm not clear from your explanation whether this would result in a reasonable number of output rows. But assuming it does, write a little code to massage them into the shape you want.