PostgreSQL matching interval between start and end

2020-02-05 03:52发布

I am designing some system that will store records containing a begin and end time. For example:

CREATE TABLE test (
  id bigserial PRIMARY KEY,
  ts_start timestamp NOT NULL,
  ts_end timestamp NOT NULL,
  foo bar NOT NULL,
  ...
);

Now I want to run queries on this to find all rows that overlap with a certain timestamp. This would result in a where clause like:

WHERE ts_start <= '2006-4-6 12:34:56' AND ts_end > '2006-4-6 12:34:56'

I did test this with a massive amount of generated test data and the performance is quite bad. I tested it with an index on ts_start and another index on ts_end and also with an multi column index on ts_start and ts_end. The last gave the best result but it is still far from optimal.

The problem is that postgresql doesn't know the fact that ts_end is guaranteed to be larger then ts_start so it uses a plan that is capable of finding rows where ts_end is smaller then ts_start.

Any suggestions how to solve this problem?

Edit: For people having this problem too if you can wait a little longer then PostgreSQL 9.2 has the perfect solution: range types. 9.2 is in beta now final release will most likely be at the end of 2012.

标签: postgresql
3条回答
家丑人穷心不美
2楼-- · 2020-02-05 04:29

There was "temporal postgres" (google it) but I don't know if it is still maintained... I believe there was a discussion of including this type of search into postgres but I don't remember the final state of it. Anyway :

Example using box and gist :

CREATE TABLE segments( start INTEGER NOT NULL, stop INTEGER NOT NULL, range_box BOX NOT NULL );
INSERT INTO segments SELECT n,n+1,BOX(POINT(n,-1),POINT(n+1,1)) FROM generate_series( 1, 1000000 ) n;
CREATE INDEX segments_box ON segments USING gist( range_box );
CREATE INDEX segments_start ON segments(start);
CREATE INDEX segments_stop ON segments(stop);

EXPLAIN ANALYZE SELECT * FROM segments WHERE 300000 BETWEEN start AND stop;
 Index Scan using segments_start on segments  (cost=0.00..12959.24 rows=209597 width=72) (actual time=91.990..91.990 rows=2 loops=1)
   Index Cond: (300000 >= start)
   Filter: (300000 <= stop)
 Total runtime: 92.023 ms

EXPLAIN ANALYZE SELECT * FROM segments WHERE range_box && '(300000,0,300000,0)'::BOX;
 Bitmap Heap Scan on segments  (cost=283.49..9740.27 rows=5000 width=72) (actual time=0.036..0.037 rows=2 loops=1)
   Recheck Cond: (range_box && '(300000,0),(300000,0)'::box)
   ->  Bitmap Index Scan on segments_box  (cost=0.00..282.24 rows=5000 width=0) (actual time=0.032..0.032 rows=2 loops=1)
         Index Cond: (range_box && '(300000,0),(300000,0)'::box)
 Total runtime: 0.064 ms

As you can see gist index is ridiculously fast here (1500 times ! lol) (and you can use many operators like overlaps, is contained, contains, etc.

http://www.postgresql.org/docs/8.2/static/functions-geometry.html

查看更多
狗以群分
3楼-- · 2020-02-05 04:46

Your encountering the same problem as someone trying to index line segments and then query if a point is in the segment. You just can't do that by indexing each dimension separately and you need something that indexes by constructing some sort of BSP structure.

I'm not sure if PG has any built in data type to support date ranges, but I'm certain that if you use PostGIS to represent the time ranges as points in 2D space and then tell PG to geo-index that, you'll get the best possible performance out of this query.

Maybe there's a date-specific equivalent of my suggestion built into pg, but, as I said, I'm not familiar with it. I am familiar with the geometrical indexing capabilities of pg, though, and I think you should consider it seriously as an optimization.

Here's a naive example (although I'm sure it will be very fast to query):

  1. Represent each time range as a rectangle from the origin (0,0) to the point (from, to).
  2. Turn on geo indexing.
  3. Given a time period P you can query if it is within the time by checking if the point (P, P) is inside the rectangle with a function like ST_Contains. This query will be O(log(number of ranges)).

illustration:

               |
               |
               |
               |
        to     |
  (timestamp)  |
               |
               |
               |_________________  (from,to)
               |__               |
               |  |(p,p)         |
               |__|______________|_______________________

                                from (timestamp)
查看更多
该账号已被封号
4楼-- · 2020-02-05 04:51

The problem is that postgresql doesn't know the fact that ts_end is guaranteed to be larger then ts_start so it uses a plan that is capable of finding rows where ts_end is smaller then ts_start.

In situations like this you need to re-express your query so as to tell it to Postgres.

This much like you'd do when querying against lft/rgt in a nested set: if you've a tree indexed using lft/rgt in such a way that children have parent_lft < lft and lft < rgt and parent_lft < parent_rgt, the optimal query will rely on parent_lft < lft and lft < parent_rgt (which looks up the index on lft for a small range) rather than parent_lft < lft and rgt < parent_rgt (which looks up the index on lft from one point onward).

You're in a similar situation when you're adding an index. Unless you constrain either or both of ts_start and ts_end, you're going to look at a large set of rows.

Now I want to run queries on this to find all rows that overlap with a certain timestamp. This would result in a where clause like:

WHERE ts_start <= '2006-4-6 12:34:56' AND ts_end > '2006-4-6 12:34:56'

For that particular query, you might want to look into geometry types and using a GIST index.

Specifically, if you floor ts_start and ceil ts_end to midnight, you can get an integer representation (number of days since epoch, for instance). Then store the latter as an indexable type and query it using an overlap condition.

As a side note, there was some discussion on adding some kind of timestamp segment/event type in recent months on the pg-hackers list, but I'm miserably failing to find the relevant references through googling. So... mentioning it here, in case you're luckier than I was.

查看更多
登录 后发表回答