This question is related to this one. I have table which contains power values for devices and I need to calculate power consumption for given time span and return 10 most power consuming devices. I have generated 192 devices and 7742208 measurement records (40324 for each). This is roughly how much records devices would produce in one month.
For this amount of data my current query takes over 40s to execute which is too much because time span and amount of devices and measurements could be much higher. Should I try to solve this with different approach than lag() OVER PARTITION and what other optimizations can be made? I would really appreciate suggestions with code examples.
PostgreSQL version 9.4
Query with example values:
SELECT
t.device_id,
sum(len_y*(extract(epoch from len_x))) AS total_consumption
FROM (
SELECT
m.id,
m.device_id,
m.power_total,
m.created_at,
m.power_total+lag(m.power_total) OVER (
PARTITION BY device_id
ORDER BY m.created_at
) AS len_y,
m.created_at-lag(m.created_at) OVER (
PARTITION BY device_id
ORDER BY m.created_at
) AS len_x
FROM
measurements AS m
WHERE m.created_at BETWEEN '2015-07-30 13:05:24.403552+00'::timestamp
AND '2015-08-27 12:34:59.826837+00'::timestamp
) AS t
GROUP BY t.device_id
ORDER BY total_consumption
DESC LIMIT 10;
Table info:
Column | Type | Modifiers
--------------+--------------------------+----------------------------------------------------------
id | integer | not null default nextval('measurements_id_seq'::regclass)
created_at | timestamp with time zone | default timezone('utc'::text, now())
power_total | real |
device_id | integer | not null
Indexes:
"measurements_pkey" PRIMARY KEY, btree (id)
"measurements_device_id_idx" btree (device_id)
"measurements_created_at_idx" btree (created_at)
Foreign-key constraints:
"measurements_device_id_fkey" FOREIGN KEY (device_id) REFERENCES devices(id)
Query plan:
Limit (cost=1317403.25..1317403.27 rows=10 width=24) (actual time=41077.091..41077.094 rows=10 loops=1)
-> Sort (cost=1317403.25..1317403.73 rows=192 width=24) (actual time=41077.089..41077.092 rows=10 loops=1)
Sort Key: (sum((((m.power_total + lag(m.power_total) OVER (?))) * date_part('epoch'::text, ((m.created_at - lag(m.created_at) OVER (?)))))))
Sort Method: top-N heapsort Memory: 25kB
-> GroupAggregate (cost=1041700.67..1317399.10 rows=192 width=24) (actual time=25361.013..41076.562 rows=192 loops=1)
Group Key: m.device_id
-> WindowAgg (cost=1041700.67..1201314.44 rows=5804137 width=20) (actual time=25291.797..37839.727 rows=7742208 loops=1)
-> Sort (cost=1041700.67..1056211.02 rows=5804137 width=20) (actual time=25291.746..30699.993 rows=7742208 loops=1)
Sort Key: m.device_id, m.created_at
Sort Method: external merge Disk: 257344kB
-> Seq Scan on measurements m (cost=0.00..151582.05 rows=5804137 width=20) (actual time=0.333..5112.851 rows=7742208 loops=1)
Filter: ((created_at >= '2015-07-30 13:05:24.403552'::timestamp without time zone) AND (created_at <= '2015-08-27 12:34:59.826837'::timestamp without time zone))
Planning time: 0.351 ms
Execution time: 41114.883 ms
Query to generate test table and data:
CREATE TABLE measurements (
id serial primary key,
device_id integer,
power_total real,
created_at timestamp
);
INSERT INTO measurements(
device_id,
created_at,
power_total
)
SELECT
device_id,
now() + (i * interval '1 minute'),
random()*(50-1)+1
FROM (
SELECT
DISTINCT(device_id),
generate_series(0,10) AS i
FROM (
SELECT
generate_series(1,5) AS device_id
) AS dev_ids
) AS gen_table;