I have the following table
create table log
(
id bigint default nextval('log_id_seq'::regclass) not null
constraint log_pkey
primary key,
level integer,
category varchar(255),
log_time timestamp,
prefix text,
message text
);
It contains like 3 million of rows.
I'm comparing the following queries:
EXPLAIN SELECT id
FROM log
WHERE log_time < now() - INTERVAL '3 month'
LIMIT 100000
which yields the following plan:
Limit (cost=0.00..19498.87 rows=100000 width=8)
-> Seq Scan on log (cost=0.00..422740.48 rows=2168025 width=8)
Filter: (log_time < (now() - '3 mons'::interval))
And the same query with ORDER BY id instruction added:
EXPLAIN SELECT id
FROM log
WHERE log_time < now() - INTERVAL '3 month'
ORDER BY id ASC
LIMIT 100000
which yields
Limit (cost=0.43..25694.15 rows=100000 width=8)
-> Index Scan using log_pkey on log (cost=0.43..557048.28 rows=2168031 width=8)
Filter: (log_time < (now() - '3 mons'::interval))
I have the following questions:
The absence of ORDER BY instruction allows Postgres not to care about the order of rows. They may be as well delivered sorted. Why it does not use index without ORDER BY?
- How can Postgres use index in the first place in such a query?
WHERE
clause of the query contains a non-indexed column and to fetch that column, sequential database scan will be required, but the query withORDER BY
doesn't indicate that.
- How can Postgres use index in the first place in such a query?
The Postgres manual page says:
For a query that requires scanning a large fraction of the table, an explicit sort is likely to be faster than using an index because it requires less disk I/O due to following a sequential access pattern
Can you please clarify this statement for me? Index is always ordered. And reading an ordered structure is always faster, it is always a sequential access (at least in terms of page scanning) than reading non-ordered data and then ordering it manually.