I have a table containing the following columns:
- an integer column named
id
- a text column named
value
- a timestamp column named
creation_date
Currently, indexes have been created for the id
and value
columns.
I must search this table for a given value and want to make search as fast as I can. But I don't really need to look through records that are older than one month. So, ideally I would like to exclude them from the index.
What would be the best way to achieve this:
- Perform table partitioning. Only search through the subtable for the appropriate month.
- Create a partial index including only the recent records. Recreate it every month.
- Something else?
(PS.: "the best solution" means the solution that is the most convenient, fast and easy to maintain)
Partial index
A partial index would be perfect for that, or even a partial multicolumn index. But your condition
don't need to search value in records older than one month
is not stable. The condition of a partial index can only work with literals or IMMUTABLE
functions, i.e., constant values. You mention Recreate it every month
, but that would not agree with your definition older than one month
. You see the difference right?
If you should only need a the current (or last) month, index recreation as well as the query itself become quite a bit simpler!
I'll got with your definition "not older than one month" for the rest of this answer. I had to deal with situations like this before. The following solution worked best for me:
Base your index conditions on a fixed timestamp and use the same timestamp in your queries to convince the query planner it can use the partial index. This kind of partial will stay useful over an extended period of time, only its effectiveness deteriorates as new rows are added and older rows drop out of your time frame. The index will return more and more false positives that an additional WHERE
clause has to eliminate from your query. Recreate the index to update its condition.
Given your test table:
CREATE TABLE mytbl (
value text
,creation_date timestamp
);
Create a very simple IMMUTABLE
SQL function:
CREATE OR REPLACE FUNCTION f_mytbl_start_ts()
RETURNS timestamp AS
$func$
SELECT '2013-01-01 0:0'::timestamp
$func$ LANGUAGE sql IMMUTABLE;
Use the function in the condition of the partial index:
CREATE INDEX mytbl_start_ts_idx ON mytbl(value, creation_date)
WHERE (creation_date >= f_mytbl_start_ts());
value
comes first. Explanation in this related answer on dba.SE.
Input from @Igor in the comments made me improve my answer. A partial multicolumn index should make ruling out false positives from the partial index faster - it's in the nature of the index condition that it's always increasingly outdated (but still a lot better than not having it).
Query
A query like this will make use of the index and should be perfectly fast:
SELECT value
FROM mytbl
WHERE creation_date >= f_mytbl_start_ts() -- !
AND creation_date >= (now() - interval '1 month')
AND value = 'foo';
The only purpose of the seemingly redundant WHERE
clause: creation_date >= f_mytbl_start_ts()
is to make the query planner use the partial index.
You can drop and recreate function and index manually.
Full automation
Or you can automate it in a bigger scheme with possibly lots of similar tables:
Disclaimer: This is advanced stuff. You need to know what you are doing and consider user privileges, possible SQL injection and locking issues with heavy concurrent load!
This "steering table" receives a line per table in your regime:
CREATE TABLE idx_control (
tbl text primary key -- plain, legal table names!
,start_ts timestamp
);
I would put all such meta objects in a separate schema.
For our example:
INSERT INTO idx_control(tbl, value)
VALUES ('mytbl', '2013-1-1 0:0');
A "steering table" offers the additional benefit that you have an overview over all such tables and their respective settings in a central place and you can update some or all of them in sync.
Whenever you change start_ts
in this table the following trigger kicks in and takes care of the rest:
Trigger function:
CREATE OR REPLACE FUNCTION trg_idx_control_upaft()
RETURNS trigger AS
$func$
DECLARE
_idx text := NEW.tbl || 'start_ts_idx';
_func text := 'f_' || NEW.tbl || '_start_ts';
BEGIN
-- Drop old idx
EXECUTE format('DROP INDEX IF EXISTS %I', _idx);
-- Create / change function; Keep placeholder with -infinity for NULL timestamp
EXECUTE format('
CREATE OR REPLACE FUNCTION %I()
RETURNS timestamp AS
$x$
SELECT %L::timestamp
$x$ LANGUAGE SQL IMMUTABLE', _func, COALESCE(NEW.start_ts, '-infinity'));
-- New Index; NULL timestamp removes idx condition:
IF NEW.start_ts IS NULL THEN
EXECUTE format('
CREATE INDEX %I ON %I (value, creation_date)', _idx, NEW.tbl);
ELSE
EXECUTE format('
CREATE INDEX %I ON %I (value, creation_date)
WHERE creation_date >= %I()', _idx, NEW.tbl, _func);
END IF;
RETURN NULL;
END
$func$ LANGUAGE plpgsql;
Trigger:
CREATE TRIGGER upaft
AFTER UPDATE ON idx_control
FOR EACH ROW
WHEN (OLD.start_ts IS DISTINCT FROM NEW.start_ts)
EXECUTE PROCEDURE trg_idx_control_upaft();
Now, a simple UPDATE
on the steering table calibrates index and function:
UPDATE idx_control
SET start_ts = '2013-03-22 0:0'
WHERE tbl = 'mytbl';
You can run a cron job or call this manually.
Queries using the index don't change.
-> SQLfiddle.
I updated the fiddle with a small test case of 10k rows to demonstrate it works.
PostgreSQL will even do an index-only scan for my example query. Won't get any faster than this.