Assume you have k<=10^5 intervals [a_i, b_i] \in [1,10^18] (some of them may overlap), and you need to choose a set of intervals mutually disjoint such that their union is maximal. Not maximum number of disjoint intervals, but the union must cover the most.
Can't try all possible subsets 2^k infeasible.
Greedy approaches ordering by a_i ( interval covering algorithm) and ordering by b_i ( maximum number of disjoint intervals algorithm ) didn't work
Can't figure out if there is a dynamic program solution.
Given the size of the input, I think the solution should be O(k log k) or O(k)
Examples
1. [1,4], [3,5], [5,9], [7, 18]
Sol [3,5]u[7,18]
[1,2], [2,6], [3,4], [5,7]
Sol [1,2]u[3,4]u[5,7]
[2,30], [25,39], [30,40]
Sol [2,30]
The problem can be solved in O(k log(k))
.
First sort the intervals by their upper bounds (the b_i
s). Let I(1), I(2), ..., I(k)
be the list of sorted intervals. That is,
b_1 <= b_2 <= ... <= b_k
Denote by w(i)
the length of interval I(i)
. That is,
w(i) = b_i - a_i
Denote by f(i)
the total length of the optimal solution among those whose last interval is I(i)
. That is, the solution corresponding to f(i)
is a set which:
- contains the interval
I(i)
- doesn't contain any interval whose upper bound is above
b_i
- has the maximum cover among the sets of (non-overlapping) intervals satisfying 1+2
Now we are going to compute f(1), f(2), ..., f(k)
and return the maximum value of them all. Clearly, the optimal solution corresponds to one of the f(i)
s and therefore the maximal f(i)
is the optimal solution.
To compute each f(i)
we use dynamic programming. We do this by relying on the following recurrence relation:
f(i) = w(i) + max{f(j) | b_j < a_i}
I'll demonstrate the computation with your first input example:
I(1)=[1, 4], w(1)=3
I(2)=[3, 5], w(2)=2
I(3)=[5, 9], w(3)=4
I(4)=[7, 18], w(4)=11
We compute f(i)
for i=1, 2, 3, 4
:
f(1) = w(1) + max{None} = 3
f(1) intervals: {I(1)}
f(2) = w(2) + max{None} = 2
f(2) intervals: {I(2)}
f(3) = w(3) + max{f(1)} = 4 + 1 = 5
f(3) intervals = {I(1), I(3)}
f(4) = w(4) + max{f(1), f(2)} = 11 + f(1) = 11 + 3 = 14
f(4) intervals = {I(1), I(4)}
The maximum f(i)
is f(4)
which corresponds to the set of intervals {I(1), I(4)}
, the optimal solution.
There seems to be a O(k * log(k)) solution. It can be achieved with segment tree data structure.
We may at first populate some endPos array of segment endings, sort it. Memorise for each of the segments corresponding endPos index. For this let endPosIdx be such array that endPosIdxj will store an index in endPos where the j-th segment ends.
Next we will introduce a segment tree. It will process the following requests:
1. getMax(i)
- get maximum value on the range [0, i]
.
2. update(i, value)
- update maximum at i
-th position with value
.
i is and index in endPos array. Calling getMax(i) we ask for what maximum cover can we achieve if non of the segments ends after endPosi. Calling update(i, value) we say that now there exists a cover with length value ending at endPosi.
Sort all segments in increasing order by their starting position aj. Process them in that order. The gist is to find the largest cover if we will certainly take current segment in resulting set. Current cover will equal to the sum of the length of current segment and max cover of the segments ending before current. Let j be the index of current segment (they are sorted by start pos). Let i then be such max index that endPosi ≤ aj (i may be found from j by binary search). Then we can find
coverj = lengthj + getMax(i)
Next we should update segment tree calling update(endPosIdxj, coverj) and proceed to the next segment.
After processing of all the segments the solution can be found by calling getMax(size(endPos)).