Data structure for non-overlapping ranges within a

2019-02-06 23:03发布

问题:

I need a data structure that can store non-overlapping ranges within a single dimension. The entire range of the dimension need not be completely covered.

An example would be a conference room scheduler. The dimension is time. No two schedules may overlap. The conference room isn't always scheduled. In other words, for a given time there can be at most one schedule.

A quick solution is for a range to store the start and end times.

Range {
    Date start
    Date end
}

This is non-normalized and requires the container to enforce no overlapping. For two adjacent ranges, the previous' end will be redundant with the next's start.

Another scheme might involve storing one boundary value with each range. But for a contiguous sequence of ranges, there will always be one more boundary values than ranges. To get around this the sequence could be represented as alternating boundary values and ranges:

B = boundary value, r = range

B-r-B-r-B

The data structure might look like:

Boundary {
    Date value
    Range prev
    Range next
}

Range {
    Boundary start
    Boundary end
}

In essence it's a doubly linked list with alternating types.

Ultimately, whatever data structure I use will be represented in both memory (application code) and a relational database.

I'm curious what academic or industry tried solutions exists.

回答1:

The normalized way to represent your data would be to store a record for each unit of time. This can be done in the example of the conference scheduling application. Your constraint would be a unique constraint for

(RoomId, StartTime)

In the case of continuous ranges, you necessarily need to store 2 things, one boundary and either the second boundary or the length. It is usually done by storing the second boundary and then creating a constraint on both boundary of the kind

(boundary not between colBoudaryA and colBoundaryB)

with the additional constraint that

(startBoundary < endBoundary)


回答2:

A doubly linked list works well because you only use as much memory as you have filled ranges, and you need only check overlapping on insertions - it's almost trivial to do so at that point. If there's overlap the new item is rejected.

RoomID
ReservationID
PreviousReservationID
NextReservationID
StartTimeDate
EndTimeDate
Priority
UserID

The priority and UserID allow for schedules to have priorities (professor might have more clout than a student group) so that a new item can 'knock' the lower priority items out of the way during an insertion, and the UserID allows an email to be sent to the bumped meeting organizers.

You'll want to consider adding a table that points to the first meeting of each day so that searches can be optimized.

-Adam



回答3:

  1. For non-overlapping intervals you could just sort you intervals with starting point. When you add a new interval to this structure, you could just check that start and end points do not belong to this interval set. To check whether some point X belong interval set you could use binary search to find the nearest start point and check that X belongs it's interval. This approach is not so optimal for modify operations.

  2. You could look at Interval tree structure - for non-overlapping intervals it has optimal query and modify operations.



回答4:

If you are lucky (!) enough to be using Postgres, you can use a tstzrange column, and apply a constraint to prevent overlaps. The bonus of using a range type is that it will inherently prevent start being greater than finish.

ALTER TABLE "booking" 
ADD CONSTRAINT "overlapping_bookings" 
EXCLUDE USING gist ("period" WITH &&, "room" WITH =);

You may need to CREATE EXTENSION IF NOT EXISTS btree_gist, as creating a gist using && is not supported without that extension.



回答5:

A lot depends on what you'll be doing with the data, and therefore which operations need to be efficient. However, I'd consider a doubly linked list of Ranges with logic in the setters of Start and End to check whether it now overlaps its neighbours, and to shrink them if so (or throw an exception, or however you want to handle an attempted overlap).

That gives a nice simple linked list of booked periods to read, but no container responsible for maintaining the no-overlap rule.



回答6:

This is called the "Unary Resource" constraint in the Constraint Programming world. There is a lot of research in this area, specifically for the case when the event times aren't fixed, and you need to find time-slots for each of them. There is a commercial C++ package that does your problem and more Ilog CP, but it is likely overkill. There is also a somewhat open-source version called eclipse (no relation to the IDE).



回答7:

This is non-trivial because (in the database world) you have to compare multiple rows to determine non-overlapping ranges. Clearly, when the information is in memory, then other representations such as lists in time order are possible. I think, though, that you'd be best off with your 'start + end' notation, even in a list.

There are whole books on the subject - part of 'Temporal Database' handling. Two you could look at are Darwen, Date and Lorentzos "Temporal Data and the Relational Model" and (at a radically different extreme) "Developing Time-Oriented Database Applications in SQL", Richard T. Snodgrass, Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999, 504+xxiii pages, ISBN 1-55860-436-7. That is out of print but available as PDF on his web site at cs.arizona.edu (so a Google search makes it pretty easy to find).

One of the relevant data structures is, I believe, an R-Tree. That is often used for 2-dimensional structures, but can also be effective for 1-dimensional structures.

You can also look for "Allen's Relations" for intervals - they may be helpful to you.



回答8:

I've had success storing a beginning time and duration. The test for overlap would be something like

WHERE NOT EXISTS (
   SELECT 1 FROM table
   WHERE BeginTime < NewBeginTime AND BeginTime + Duration > NewBeginTime
)
AND NOT EXISTS (
   SELECT 1 FROM table
   WHERE NewBeginTime < BeginTime AND NewBeginTime + NewDuration > BeginTime
)

I think without testing, but hopefully you get the drift