I have a question that is very similar to algorithm to find longest non-overlapping sequences.
The only difference to the linked question is that instead of finding the set of non-overlapping tuples that represent the longest sequence, I need to find the set of non-overlapping tuples that represent the maximum coverage, by which I mean the sum of the tuple lengths is maximum (a tuple length being last - first + 1
given the definition of tuple in the next sentence).
I represent my tuples differently than the linked problem. Instead of (starting index, length)
, I represent my tuples as (first, last)
; for example, the tuple (3,7)
represents the set of numbers [3, 4, 5, 6, 7]
. (A tuple overlaps another tuple even if the end-points match; i.e., (2,6)
and (6,8)
overlap and therefore cannot both appear in the solution.)
As an example, consider the following set of tuples, sorted by first
:
[(0,100), (2,50), (30,150), (60,95), (110,190), (120,150), (191,200)]
The maximal set here would be [(0,100), (110,190), (191,200)]
with a coverage of 101 + 81 + 10 = 192
. (Note that the tuples in this solution are non-overlapping.)
What is an example of the least complex algorithm to solve this, and what is the complexity of that algorithm? (It would be just great if this could be solved in O(N)
, but I don't know at the moment if it can be.)
ADDENDUM: In retrospect, it turns out that the question I am asking here is equivalent to the weighted interval scheduling problem. This is a special case of the interval scheduling problem.
@j_random_hacker's answer, below, is, in fact, the known solution to the weighted interval scheduling problem, with a complexity in time of O(N log(N))
.
Here is an O(nlog n)-time, O(n)-space algorithm. First, sort the array of tuples by their starting position if they are not already in this order. I'll assume zero-based array indices.
Let's call the beginning position of tuple i b(i) and the ending position e(i), so that its total length is e(i) - b(i) + 1. Also let's define a function next(i) that returns the position within the tuple list of the first tuple that can appear to the right-hand side of tuple i. Notice that next(i) can be calculated in O(log n) time with a binary search: just keep all the tuple beginning positions b(i) in an array b[], and search for the first j in the subarray b[i+1 .. n-1] having b[j] > e(i).
Let's define f(i) to be the maximum coverage of any nonoverlapping set of tuples that begins at or after tuple i. Since tuple i itself is either in this optimal set or not, we have:
f(i) = max(e(i) - b(i) + 1 + f(next(i)), f(i+1)) for 0 <= i < n
We also have the boundary condition f(n) = 0
.
Clearly the largest possible coverage is given by f(0). This is easily calculated. In pseudo-C++:
int b[] = /* Tuple beginning positions, in nondecreasing order */;
int e[] = /* Tuple end positions */;
int n = /* Number of tuples */;
// Find the array position of the leftmost tuple that begins to the right of
// where tuple i ends.
int next(int i) {
return upper_bound(b + i + 1, b + n, e[i]);
}
int maxCov[n + 1]; // In practice you should dynamically allocate this
// After running this, maxCov[i] will contain the maximum coverage of any
// nonoverlapping subset of the set of n - i tuples whose beginning positions
// are given by b[i .. n-1] and whose ending points are given by e[i .. n-1].
// In particular, maxCov[0] will be the maximum coverage of the entire set.
void calc() {
maxCov[n] = 0;
for (int i = n - 1; i >= 0; --i) {
maxCov[i] = max(e[i] - b[i] + 1 + maxCov[next(i)], maxCov[i + 1]);
}
}
The loop in calc()
runs n times, and each iteration makes one O(log n) call to the binary search function upper_bound()
.
We can reconstruct an actual set of this size by calculating both the inputs to max() for f(0), seeing which one actually produced the maximum, recording whether it implies the presence or absence of tuple 0, and then recursing to handle the remainder (corresponding to either f(next(0)) or f(1)). (If both inputs are equal then there are multiple optimal solutions and we can follow either one.)
The algorithm below works by recursively retrieving the largest non-overlapping set that each element is the leftmost member of and then returning the one with the largest coverage. See comments in the code.
Implemented in PHP. You can test it here http://viper-7.com/xowTRF
I think this algorithm's complexity is O(2^N)
or O(N^2)
with caching, feel free to leave a comment if you disagree.
$set = [[0,100], [2,50], [30,150], [60,95], [110,190], [120,150], [191,200]];
$GLOBALS['cache'] = array(); //cache for overlapping sub problems
function maximumSet($set) {
if(count($set) === 1) {
return $set;
}
$cache_key = [];
foreach($set as $k) {
$cache_key[] = implode($k,'.');
}
$cache_key = implode('_',$cache_key);
if(isset($GLOBALS['cache'][$cache_key])) {
return $GLOBALS['cache'][$cache_key];
}
$maximumResult = null;
//for each element in the set,
//get the largest non-overlapping set that the element is a member of
//once all N sets have been computed, return the largest one
foreach($set as $k => $element) {
unset($set[$k]);
//create a new set $copy, which contains the remaining elements that
//do not overlap with $element
$copy = $set;
foreach($set as $k2 => $element2) {
if($element2[0] <= $element[1]) {
//element is considered non overlapping if its start
//is less than or equal to current element's end
unset($copy[$k2]);
}
else break; //because the list is sorted we can break the 1st time
//see a non-overlapping element
}
if(!empty($copy)) {
//if there is at least one non-overlapping element
//recursively get the maximum set
$maximumSubset = maximumSet($copy);
//prepend the current element to it
array_unshift($maximumSubset,$element);
}
else {
//otherwise the maximum non-overlapping set which contains this element
//is the element itself
$maximumSubset = [$element];
}
//store the set in the results by coverage
$coverage = getCoverage($maximumSubset);
if(is_null($maximumResult) || $maximumResult['coverage'] < $coverage) {
$maximumResult = [
'coverage' => $coverage,
'set' => $maximumSubset
];
}
}
$GLOBALS['cache'][$cache_key] = $maximumResult['set'];
return $maximumResult['set'];
}
function getCoverage($set) {
$range = 0;
foreach($set as $v) {
$range += ($v[1] - $v[0]);
}
return $range;
}
$x = maximumSet($set);
print "<pre>";
print_r($x);
print "</pre>";