I'm developing a programming language for which I want to provide a Range data type which for now is, not as usually, a list of pairs of int
values (x,y)
with the constraint that x < y
. I say not as usually because usually a range is just a pair but in my case it is more than than, allowing to have for example
1 to 5, 7 to 11, 13 to 22
all contained in a single object.
I would like to provide two functions to generate the union and the instersection of two ranges, that should contain the least number of non-overlapping intervals from a couple of ranges.. so for example
1 to 5 || 3 to 8 = 1 to 8
1 to 5 && 3 to 8 = 3 to 5
(1 to 3, 4 to 8) && 2 to 6 = (2 to 3, 4 to 6)
where ||
is union and &&
is intersection.
For now their implementation is, as stated before, just a list.. I know that a more suitable data structure exists (interval tree) but for now I'm more concerned in building new lists from the union/intersection of other lists..
Which are the state-of-the-art algorithms to implement these two functions?
Thanks in advance
Since you don't want to deal with interval trees and use only lists, I would suggest you keep your range representation as a list of disjoint intervals sorted by the x coordinates.
Given two lists, you can compute the union and intersection by doing a kind of merge step like we do in merge of sorted lists.
Example:
For union of L1 and L2:
Create L to be empty List.
Take the interval with smallest x from L1 and L2 and put onto L.
Now take smallest again, compare with last element of L, and decide whether they need to be merged (if overlap) or a new interval appended (if they don't overlap) at the end of L and continue processing, like we do in merging two sorted lists.
Once you are done, L will be the union whose intervals are sorted by x and are disjoint.
For intersection, you can do something similar.
It seems to me that the best way of storing intervals - interval trees - is also the means to perform operations on them. Since doing point-tree intersections is the main case supported by interval tree query, it doesn't seem to be too hard to extend that to interval-interval intersection: for each interval in tree1
, query tree2
for both endpoints. For intersection, subtract the intersecting interval from tree1
, and for union, add the intersecting interval. For each subtract/add operation, you'll get a new set of intervals to add to your new tree3
, which will end up being the result of your operation.
Without interval trees .....
No special ordering needed ...
Probably not "state of the art" :)
(* "{ ... }" means "list" *)
function Union [{x1_,x2_},{y1_,y2_}] :=
if {x1,x2}=={} then {x1,x2}={y1,y2} (* This is needed because the intersection *)
if {y1,y2}=={} then {y1,y2}={x1,x2} (* may return an empty interval *)
(* so, empty intervals need this code *)
if {y1,y2}=={} then return[{}] (* Both intervals were empty! *)
if Min[x1,x2] > Min[y1,y2]
then
return[Union[{y1,y2},{x1,x2}]] (* or swap intervals *)
else
If Max[x1,x2] < Min[y1,y2]
then (* Non Overlapping*)
return[{{x1,x2},{y1,y2}}]
else (* Overlapping intervals *)
return[{Min[x1,x2],Max[x1,x2,y1,y2]}]
end function <Union>
function Intersection [{x1_,x2_},{y1_,y2_}] :=
if ({x1,x2}=={} OR {y1,y2}=={}) then return[{}] (* empty intervals do exist *)
if Min[x1,x2] > Min[y1,y2]
then
return[Intersection[{y1,y2},{x1,x2}]] (* or swap intervals *)
else
If Max[x1,x2] < Min[y1,y2]
then (* Non Overlapping*)
return[{}] (* HERE we create an empty interval*)
else (* Overlapping intervals *)
return[{Min[y1,y2],Min[Max[x1,x2],Max[y1,y2]]}]
end function <Intersection>
Edit>
Perhaps the generalization to n arguments is better than the diadic functions.
Something like> (sorry for the non standard pseudocode)
function GenUnion[{L:list of intervals}]:=
if (Tail[L] is interval)
return[Union[Head[L],Tail[L]]]
else
return[Union[Head[L], GenUnion[Head[Tail[L]],Tail[Tail[L]]]
end function <GenUnion>
I'm going to assume that each range is itself non-overlapping, minimal, and is ordered.
If that's the case, you can just "zip them up":
- Take the interval that starts earliest
- If the next interval from the other range starts after this one ends, output this interval and goto 1
- If the next interval from the other range ends before this one, discard that other interval and goto 2
- Set the start of the other interval equal to the start of this one, discard this interval, and goto 1
Afterwards you can go through and merge neighbouring intervals if necessary.
Some minor optimizations may be available, but this basically implements the union operation, and should show the general process for implementing the intersection.
I'll post my own implementation so far (just union operation), I'm using a functional language so be warned.. it may be confusing:
let rec calc c l1 l2 =
match c,l1,l2 with
None, (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when f1 < f2 -> calc (Some (f1,t1)) y1 n2
| None, n1, (f2,t2) :: y2 -> calc (Some (f2,t2)) n1 y2
| None, _, _ -> []
| (Some (fc,tc) as cur), (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when t1 <= fc -> calc cur y1 n2
| (Some (fc,tc) as cur), ((f1,t1) :: y1 as n1), (f2,t2) :: y2 when t2 <= fc -> calc cur n1 y2
| Some (fc,tc), (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when f1 <= tc && t1 > fc -> calc (Some (fc,t1)) y1 n2
| Some (fc,tc), ((f1,t1) :: y1 as n1), (f2,t2) :: y2 when f2 <= tc && t2 > fc -> calc (Some (fc,t2)) n1 y2
| Some (fc,tc), (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when f1 < f2 -> [fc,tc] @ calc (Some (f1,t1)) y1 n2
| Some (fc,tc), (t :: e as n1), (f2,t2) :: y2 -> [fc,tc] @ calc (Some (f2,t2)) n1 y2
| Some (fc,tc), [], (f,t) :: tr when f <= tc && t > tc -> calc (Some (fc,t)) [] tr
| Some (fc,tc), [], (f,t) :: tr when f <= tc && t <= tc -> calc (Some (fc,tc)) [] tr
| Some (fc,tc), [], x -> [fc,tc] @ x
| Some (fc,tc), (f,t) :: tr, [] when f <= tc && t > tc -> calc (Some (fc,t)) tr []
| Some (fc,tc), (f,t) :: tr, [] when f <= tc && t <= tc -> calc (Some (fc,tc)) tr []
| Some (fc,tc), x, [] -> [fc,tc] @ x
It uses the c
argument to store the current interval (on which merge overlapping ranges) while l1
and l2
are two int*int list
.
Syntax is easy, every row represent a single case that has c
, l1
and l2
specified in a precise way. Let me give you just some examples:
(Some (fc,tc) as cur), (f1,t1) :: y1, ((f2,t2) :: y2 as n2) when t1 <= fc -> calc cur y1 n2
I have a current range fc,tc
and the two lists contains at least one element (that's why of (f1,t1)::y1
), so if t1 <= fc
then the range of first list ends before the current and i can be discarded so it calls recursively itself with same cur
range, y1
in which the first one is discarded and n2
that is an alias for the same list received as an argument.
This is similar for every other case, when I find that no next range overlaps with cur
the I return cur
as an element of the final answer and start again from an empty one.