I'm developing a programming language for which I want to provide a Range data type which for now is, not as usually, a list of pairs of int
values (x,y)
with the constraint that x < y
. I say not as usually because usually a range is just a pair but in my case it is more than than, allowing to have for example
1 to 5, 7 to 11, 13 to 22
all contained in a single object.
I would like to provide two functions to generate the union and the instersection of two ranges, that should contain the least number of non-overlapping intervals from a couple of ranges.. so for example
1 to 5 || 3 to 8 = 1 to 8
1 to 5 && 3 to 8 = 3 to 5
(1 to 3, 4 to 8) && 2 to 6 = (2 to 3, 4 to 6)
where ||
is union and &&
is intersection.
For now their implementation is, as stated before, just a list.. I know that a more suitable data structure exists (interval tree) but for now I'm more concerned in building new lists from the union/intersection of other lists..
Which are the state-of-the-art algorithms to implement these two functions?
Thanks in advance
It seems to me that the best way of storing intervals - interval trees - is also the means to perform operations on them. Since doing point-tree intersections is the main case supported by interval tree query, it doesn't seem to be too hard to extend that to interval-interval intersection: for each interval in
tree1
, querytree2
for both endpoints. For intersection, subtract the intersecting interval fromtree1
, and for union, add the intersecting interval. For each subtract/add operation, you'll get a new set of intervals to add to your newtree3
, which will end up being the result of your operation.Without interval trees ..... No special ordering needed ... Probably not "state of the art" :)
Edit>
Perhaps the generalization to n arguments is better than the diadic functions.
Something like> (sorry for the non standard pseudocode)
Since you don't want to deal with interval trees and use only lists, I would suggest you keep your range representation as a list of disjoint intervals sorted by the x coordinates.
Given two lists, you can compute the union and intersection by doing a kind of merge step like we do in merge of sorted lists.
Example:
For union of L1 and L2:
Create L to be empty List.
Take the interval with smallest x from L1 and L2 and put onto L.
Now take smallest again, compare with last element of L, and decide whether they need to be merged (if overlap) or a new interval appended (if they don't overlap) at the end of L and continue processing, like we do in merging two sorted lists.
Once you are done, L will be the union whose intervals are sorted by x and are disjoint.
For intersection, you can do something similar.
I'm going to assume that each range is itself non-overlapping, minimal, and is ordered.
If that's the case, you can just "zip them up":
Afterwards you can go through and merge neighbouring intervals if necessary.
Some minor optimizations may be available, but this basically implements the union operation, and should show the general process for implementing the intersection.
I'll post my own implementation so far (just union operation), I'm using a functional language so be warned.. it may be confusing:
It uses the
c
argument to store the current interval (on which merge overlapping ranges) whilel1
andl2
are twoint*int list
.Syntax is easy, every row represent a single case that has
c
,l1
andl2
specified in a precise way. Let me give you just some examples:I have a current range
fc,tc
and the two lists contains at least one element (that's why of(f1,t1)::y1
), so ift1 <= fc
then the range of first list ends before the current and i can be discarded so it calls recursively itself with samecur
range,y1
in which the first one is discarded andn2
that is an alias for the same list received as an argument.This is similar for every other case, when I find that no next range overlaps with
cur
the I returncur
as an element of the final answer and start again from an empty one.