Data structure for non-overlapping ranges within a

2019-02-06 22:50发布

I need a data structure that can store non-overlapping ranges within a single dimension. The entire range of the dimension need not be completely covered.

An example would be a conference room scheduler. The dimension is time. No two schedules may overlap. The conference room isn't always scheduled. In other words, for a given time there can be at most one schedule.

A quick solution is for a range to store the start and end times.

Range {
    Date start
    Date end
}

This is non-normalized and requires the container to enforce no overlapping. For two adjacent ranges, the previous' end will be redundant with the next's start.

Another scheme might involve storing one boundary value with each range. But for a contiguous sequence of ranges, there will always be one more boundary values than ranges. To get around this the sequence could be represented as alternating boundary values and ranges:

B = boundary value, r = range

B-r-B-r-B

The data structure might look like:

Boundary {
    Date value
    Range prev
    Range next
}

Range {
    Boundary start
    Boundary end
}

In essence it's a doubly linked list with alternating types.

Ultimately, whatever data structure I use will be represented in both memory (application code) and a relational database.

I'm curious what academic or industry tried solutions exists.

8条回答
Summer. ? 凉城
2楼-- · 2019-02-06 23:07

The normalized way to represent your data would be to store a record for each unit of time. This can be done in the example of the conference scheduling application. Your constraint would be a unique constraint for

(RoomId, StartTime)

In the case of continuous ranges, you necessarily need to store 2 things, one boundary and either the second boundary or the length. It is usually done by storing the second boundary and then creating a constraint on both boundary of the kind

(boundary not between colBoudaryA and colBoundaryB)

with the additional constraint that

(startBoundary < endBoundary)
查看更多
▲ chillily
3楼-- · 2019-02-06 23:08

This is non-trivial because (in the database world) you have to compare multiple rows to determine non-overlapping ranges. Clearly, when the information is in memory, then other representations such as lists in time order are possible. I think, though, that you'd be best off with your 'start + end' notation, even in a list.

There are whole books on the subject - part of 'Temporal Database' handling. Two you could look at are Darwen, Date and Lorentzos "Temporal Data and the Relational Model" and (at a radically different extreme) "Developing Time-Oriented Database Applications in SQL", Richard T. Snodgrass, Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999, 504+xxiii pages, ISBN 1-55860-436-7. That is out of print but available as PDF on his web site at cs.arizona.edu (so a Google search makes it pretty easy to find).

One of the relevant data structures is, I believe, an R-Tree. That is often used for 2-dimensional structures, but can also be effective for 1-dimensional structures.

You can also look for "Allen's Relations" for intervals - they may be helpful to you.

查看更多
放荡不羁爱自由
4楼-- · 2019-02-06 23:10

A doubly linked list works well because you only use as much memory as you have filled ranges, and you need only check overlapping on insertions - it's almost trivial to do so at that point. If there's overlap the new item is rejected.

RoomID
ReservationID
PreviousReservationID
NextReservationID
StartTimeDate
EndTimeDate
Priority
UserID

The priority and UserID allow for schedules to have priorities (professor might have more clout than a student group) so that a new item can 'knock' the lower priority items out of the way during an insertion, and the UserID allows an email to be sent to the bumped meeting organizers.

You'll want to consider adding a table that points to the first meeting of each day so that searches can be optimized.

-Adam

查看更多
放我归山
5楼-- · 2019-02-06 23:11

A lot depends on what you'll be doing with the data, and therefore which operations need to be efficient. However, I'd consider a doubly linked list of Ranges with logic in the setters of Start and End to check whether it now overlaps its neighbours, and to shrink them if so (or throw an exception, or however you want to handle an attempted overlap).

That gives a nice simple linked list of booked periods to read, but no container responsible for maintaining the no-overlap rule.

查看更多
Bombasti
6楼-- · 2019-02-06 23:12
  1. For non-overlapping intervals you could just sort you intervals with starting point. When you add a new interval to this structure, you could just check that start and end points do not belong to this interval set. To check whether some point X belong interval set you could use binary search to find the nearest start point and check that X belongs it's interval. This approach is not so optimal for modify operations.

  2. You could look at Interval tree structure - for non-overlapping intervals it has optimal query and modify operations.

查看更多
仙女界的扛把子
7楼-- · 2019-02-06 23:15

I've had success storing a beginning time and duration. The test for overlap would be something like

WHERE NOT EXISTS (
   SELECT 1 FROM table
   WHERE BeginTime < NewBeginTime AND BeginTime + Duration > NewBeginTime
)
AND NOT EXISTS (
   SELECT 1 FROM table
   WHERE NewBeginTime < BeginTime AND NewBeginTime + NewDuration > BeginTime
)

I think without testing, but hopefully you get the drift

查看更多
登录 后发表回答