Continuation of other question here:
How to get date_part query to hit index?
When executing the following query, it hits a compound index I created on the datelocal, views, impressions, gender, agegroup fields:
SELECT date_part('hour', datelocal) AS hour
, SUM(views) FILTER (WHERE gender = 'male') AS male
, SUM(views) FILTER (WHERE gender = 'female') AS female
FROM reportimpression
WHERE datelocal >= '2019-02-01' AND datelocal < '2019-03-01'
GROUP BY 1
ORDER BY 1;
However, I'd like to be able to also filter this query down based on additional clauses in the WHERE, for example:
SELECT date_part('hour', datelocal) AS hour
, SUM(views) FILTER (WHERE gender = 'male') AS male
, SUM(views) FILTER (WHERE gender = 'female') AS female
FROM reportimpression
WHERE datelocal >= '2019-02-01' AND datelocal < '2019-03-01'
AND network LIKE '%'
GROUP BY 1
ORDER BY 1;
This second query is MUCH slower than the first, although it should be operating on far fewer records, in addition to the fact that it doesn't hit my index.
Table schema:
CREATE TABLE reportimpression (
datelocal timestamp without time zone,
devicename text,
network text,
sitecode text,
advertisername text,
mediafilename text,
gender text,
agegroup text,
views integer,
impressions integer,
dwelltime numeric
);
-- Indices -------------------------------------------------------
CREATE INDEX reportimpression_datelocal_index ON reportimpression(datelocal timestamp_ops);
CREATE INDEX reportimpression_viewership_index ON reportimpression(datelocal timestamp_ops,views int4_ops,impressions int4_ops,gender text_ops,agegroup text_ops);
CREATE INDEX reportimpression_test_index ON reportimpression(datelocal timestamp_ops,(date_part('hour'::text, datelocal)) float8_ops);
Analyze output:
Finalize GroupAggregate (cost=1005368.37..1005385.70 rows=3151 width=24) (actual time=70615.636..70615.649 rows=24 loops=1)
Group Key: (date_part('hour'::text, datelocal))
-> Sort (cost=1005368.37..1005369.94 rows=3151 width=24) (actual time=70615.631..70615.634 rows=48 loops=1)
Sort Key: (date_part('hour'::text, datelocal))
Sort Method: quicksort Memory: 28kB
-> Gather (cost=1005005.62..1005331.75 rows=3151 width=24) (actual time=70615.456..70641.208 rows=48 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Partial HashAggregate (cost=1004005.62..1004016.65 rows=3151 width=24) (actual time=70613.132..70613.152 rows=24 loops=2)
Group Key: date_part('hour'::text, datelocal)
-> Parallel Seq Scan on reportimpression (cost=0.00..996952.63 rows=2821195 width=17) (actual time=0.803..69876.914 rows=2429159 loops=2)
Filter: ((datelocal >= '2019-02-01 00:00:00'::timestamp without time zone) AND (datelocal < '2019-03-01 00:00:00'::timestamp without time zone) AND (network ~~ '%'::text))
Rows Removed by Filter: 6701736
Planning time: 0.195 ms
Execution time: 70641.349 ms
Do I need to create additional indexes, tweak my SELECT, or something else entirely?