I have two sets of ranges, represented by [ start, stop ] values. Some of the ranges overlap, meaning that the start of one range is in between the [ start, stop ] of the other range. I'd like to make a new set of ranges that has no such overlap, and also doesn't include any new values in a range.
The ranges look like this:
@starts @ends
5 108
5 187
44 187
44 229
44 236
64 236
104 236
580 644
632 770
The output that I expect would be this:
@starts @ends
5 236
580 770
This is because the first seven ranges overlap with the interval from 5 => 236, and the last two overlap with the interval from 632 => 770.
Here's the code that I tried:
$fix = 0;
foreach (@ends) {
if ($starts[$fix + 1] < $ends[$fix]) {
splice(@ends, $fix, $fix);
splice(@starts, $fix + 1, $fix + 1);
} else {
$fix += 1;
}
}
I can print out the values myself, I just need help with the algorithm for merging.
This edits your arrays in-place, simply collapsing boundaries when they overlap.
# Since they're sorted by @starts, accept the 0th interval, start at 1
for (1..$#starts) {
# extra check on array bounds, since we edit in-place
last unless $_ < @starts;
# don't need to collapse if no overlap with previous end
next unless $starts[$_] <= $ends[$_-1];
# delete this start and the previous end
splice(@starts,$_,1);
splice(@ends,$_-1,1);
# rerun this loop for the same value of $_ since it was deleted
redo;
}
I think that this is what you want. You have a series of ranges of the form [start,stop], and you'd like to merge the overlapping ranges. The approach below is fairly simple.
- There are two sets of ranges, the
original set and the merged set.
- You add the first range to the set
of merged (non-overlapping) ranges.
For each candidate range left from
the original set, you make a choice:
- If that candidate overlaps with one already in the merged set, you extend the boundaries of the range in the merged set appropriately.
- If there is no overlap between the candidate range and any range in the merged set, you add the candidate to the merged set.
Hopefully this makes sense. It's not too obvious from your question that this is what you wanted, so let me know if this isn't right.
#!/usr/bin/perl
use strict;
use warnings;
my @starts = qw/ 5 5 44 44 44 64 104 580 632 /;
my @ends = qw/ 108 187 187 229 236 236 236 644 770 /;
my @ranges;
while ( @starts && @ends ) {
my $s = shift @starts;
my $e = shift @ends;
push @ranges, [ $s, $e ];
}
my @merged_ranges;
push @merged_ranges, shift @ranges;
foreach my $range (@ranges) {
my $overlap = 0;
foreach my $m_range (@merged_ranges) {
if ( ranges_overlap($range,$m_range) ) {
$overlap = 1;
$m_range = merge_range($range,$m_range);
}
}
if ( !$overlap ) {
push @merged_ranges, $range;
}
}
print join ' ', qw/ start end /;
print "\n";
foreach my $range (@merged_ranges) {
print join ' ', ( $range->[0], $range->[1] );
print "\n";
}
sub ranges_overlap {
my $r1 = shift;
my $r2 = shift;
return ( $r1->[0] <= $r2->[1] && $r2->[0] <= $r1->[1] );
}
sub merge_range {
my $r1 = shift;
my $r2 = shift;
use List::Util qw/ min max/;
my $merged = [ min($r1->[0],$r2->[0]), max($r1->[1],$r2->[1]) ];
return $merged;
}
Since the arrays are ordered by start, then the easiest is to work from the end:
# this assumes at least one element in @starts, @ends
my $n = $#starts;
for (my $i = $#starts - 1; $i >= 0; $i--) {
if ($ends[$i] < $starts[$n]) {
# new interval
$n--;
($starts[$n], $ends[$n]) = ($starts[$i], $ends[$i]);
} else {
# merge intervals - first scan for how far back to go
while ($n < $#starts && $ends[$i] < $starts[$n+1]) {
$n++;
}
$starts[$n] = $starts[$i];
}
}
@starts = @starts[$n..$#starts];
@ends = @ends[$n..$#ends];
I am not fluent in PERL, but the following pseudocode solution can probably be easily adapted:
for(i=0; i<N;){
//we know that the next merged interval starts here:
start = starts[i]
end = ends[i]
for(i=i+1; i < N && starts[i] < end; i++){ //perhaps you want <= ?
end = maximum(end, ends[i]);
}
add (start, end) to merged array
}
How's this?
#!perl
use strict;
use warnings;
my @starts = qw(5 5 44 44 44 64 104 580 632);
my @ends = qw(108 187 187 229 236 236 236 644 770);
my @starts_new;
my @ends_new;
if ((scalar @starts) ne (scalar @ends)) {
die "Arrays are not of equal length!\n";
}
my %ranges;
my $next_i = 0;
for (my $i=0; $i <= $#starts; $i=$next_i) {
# If nothing changes below, the next array item we'll visit is the next sequential one.
$next_i = $i + 1;
# Init some temp stuff.
my $start = $starts[$i]; # this one shouldn't change during this "for $i" loop
my $end = $ends[$i];
for (my $j=$i+1; $j <= $#ends; $j++) {
if ($starts[$j] <= $end) {
# This item further down the @starts array is actually less than
# (or equal to) the current $end.
# So, we need to "skip" this item in @starts and update
# $end to reflect the corresponding entry in @ends.
$next_i = $j +1;
$end = $ends[$j] if ($ends[$j] > $end);
}
}
# We have a valid start/end pair.
push (@starts_new, $start);
push (@ends_new, $end);
}
for (my $i=0; $i <= $#starts_new; $i++) {
print "$starts_new[$i], $ends_new[$i]\n";
}