I am wondering if anyone knows of a data structure which would efficiently handle the following situation:
The data structure should store several, possibly overlapping, variable length ranges on some continuous timescale.
For example, you might add the ranges a:[0,3], b:[4,7], c:[0,9]
.
Insertion time does not need to be particularly efficient.
Retrievals would take a range as a parameter, and return all the ranges in the set that overlap with the range, for example:
Get(1,2)
would return a and c. Get(6,7)
would return b and c. Get(2,6)
would return all three.
Retrievals need to be as efficient as possible.
One data structure you could use is a one-dimensional R-tree. These are designed to deal with ranges and to provide efficient retrieval. You will also learn about Allen's Operators; there are a dozen other relationships between time intervals than just 'overlaps'.
There are other questions on SO that impinge on this area, including:
- Determine Whether Two Date Ranges Overlap
- Data structure for non-overlapping ranges within a single dimension
You could go for a binary tree, that stores the ranges in a hierarchy. Starting from the root node, that represents an all-encompassing range divided it its middle, you test if your range you are trying to insert belong to the left subrange, right subrange, or both, and recursively carry on in the matching subnodes until you reach a certain depth, at which you save the actual range.
For lookup, you test your input range against the left and right subranges of the top node, and dive in the ones which overlap, repeating until you have reached actual ranges that you save.
This way, retrieval has a logarithmic complexity. You'd still need to manage duplicates in your retrieval, as some ranges are going to belong to several nodes.