Are execution times of these SQL queries the same?

2019-09-23 16:00发布

问题:

I believe the result obtained by these 2 queries is the same?

The first query:

SELECT 
  sensor_id,
  measurement_time,
  measurement_value
FROM 
  public.measurement_pm2_5
  WHERE (sensor_id = 12 AND measurement_time BETWEEN to_timestamp(3000) AND to_timestamp(12000))
  OR (sensor_id = 27 AND measurement_time BETWEEN to_timestamp(3000) AND to_timestamp(12000))
  OR (sensor_id = 1 AND measurement_time BETWEEN to_timestamp(500) AND to_timestamp(1000))
  OR (sensor_id = 1 AND measurement_time BETWEEN to_timestamp(6000) AND to_timestamp(9000));

The second query:

SELECT 
  sensor_id,
  measurement_time,
  measurement_value
FROM 
  public.measurement_pm2_5
  WHERE (sensor_id in (12,27) AND measurement_time BETWEEN to_timestamp(3000) AND to_timestamp(12000))
  OR (sensor_id = 1 AND ((measurement_time BETWEEN to_timestamp(500) AND to_timestamp(1000)) OR (measurement_time BETWEEN to_timestamp(6000) AND to_timestamp(9000))));

How about execution time? How big is the difference (if any)?

The first query:

Start-up Cost: 0
Total Cost: 580.56
Number of Rows: 1
Row Width: 18
Start-up Time: 2.676
Total Time: 2.676
Real Number of Rows: 0
Loops: 1

Hash Join  (cost=0.10..280.06 rows=115 width=18) (actual time=8.596..8.596 rows=0 loops=1)
  Hash Cond: (p.sensor_id = "*VALUES*".column1)
  Join Filter: ((p.measurement_time >= to_timestamp(("*VALUES*".column2)::double precision)) AND (p.measurement_time <= to_timestamp(("*VALUES*".column3)::double precision)))
  Rows Removed by Join Filter: 590
  ->  Seq Scan on measurement_pm2_5 p  (cost=0.00..207.39 rows=12439 width=18) (actual time=0.010..2.558 rows=12443 loops=1)
  ->  Hash  (cost=0.05..0.05 rows=4 width=12) (actual time=0.017..0.017 rows=4 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=4 width=12) (actual time=0.002..0.003 rows=4 loops=1)
Planning time: 0.148 ms
Execution time: 8.627 ms

The second query:

Start-up Cost: 0
Total Cost: 456.17
Number of Rows: 1
Row Width: 18
Start-up Time: 2.237
Total Time: 2.237
Real Number of Rows: 0
Loops: 1

Hash Join  (cost=0.10..280.06 rows=115 width=18) (actual time=8.596..8.596 rows=0 loops=1)
  Hash Cond: (p.sensor_id = "*VALUES*".column1)
  Join Filter: ((p.measurement_time >= to_timestamp(("*VALUES*".column2)::double precision)) AND (p.measurement_time <= to_timestamp(("*VALUES*".column3)::double precision)))
  Rows Removed by Join Filter: 590
  ->  Seq Scan on measurement_pm2_5 p  (cost=0.00..207.39 rows=12439 width=18) (actual time=0.010..2.558 rows=12443 loops=1)
  ->  Hash  (cost=0.05..0.05 rows=4 width=12) (actual time=0.017..0.017 rows=4 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=4 width=12) (actual time=0.002..0.003 rows=4 loops=1)
Planning time: 0.148 ms
Execution time: 8.627 ms

@Mike's query:

Hash Join  (cost=0.10..280.06 rows=115 width=18) (actual time=8.596..8.596 rows=0 loops=1)
  Hash Cond: (p.sensor_id = "*VALUES*".column1)
  Join Filter: ((p.measurement_time >= to_timestamp(("*VALUES*".column2)::double precision)) AND (p.measurement_time <= to_timestamp(("*VALUES*".column3)::double precision)))
  Rows Removed by Join Filter: 590
  ->  Seq Scan on measurement_pm2_5 p  (cost=0.00..207.39 rows=12439 width=18) (actual time=0.010..2.558 rows=12443 loops=1)
  ->  Hash  (cost=0.05..0.05 rows=4 width=12) (actual time=0.017..0.017 rows=4 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=4 width=12) (actual time=0.002..0.003 rows=4 loops=1)
Planning time: 0.148 ms
Execution time: 8.627 ms

The question is, if the difference in time execution between these two queries is significant when these queries are made on large database?

回答1:

Try to use this:

SELECT 
  sensor_id,
  measurement_time,
  measurement_value
FROM 
  public.measurement_pm2_5 p,
  ( values(12,3000,12000),(27,3000,12000),(1,500,1000),(1,6000,9000) ) as t(sens,t1,t2)
  WHERE p.sensor_id = t.sens
    AND measurement_time BETWEEN to_timestamp(t.t1) AND to_timestamp(t.t2);

This decision is usually faster than any OR and IN



回答2:

EXPLAIN ANALYZE // paste first query here

eg: EXPLAIN ANALYZE select * from employee;

you will get full explanation about your query and time taken by each sub query in details.