I have N rectangles with sides parallel to the x- and y-axes. There is another rectangle, model. I need to create an algorithm that can tell whether the model is completely covered by the N rectangles.
I have some ideas. I think that first, I need to sort the rectangles by their left side (it can be done in O(n log n) time), and then use a vertical sweeping line.
+------------------------------------------------------------> x
|O
| +----+
| +---------+ | |
| | ++----+--+ |
| | +-++----+-+| |
| | | | +-++-+
| +------+ +-------++
| +---------+
|
|
|
|y
The blue rectangle is the model.
First of all, I need the abstract algorithm. There are no special requirements with regard to the realization. A rectangle can be represented as a pair of points (left-top and bottom-right).
This is one of the tasks for preparing for a test. I know that the best algorithm can do this in O(n log n) time.
Okay, now it seems I can't even sleep at night because I think about this problem... but it also seems I finally got an O(n log n) solution, in average case (but with reduced chances of having a pathological input compared to
@lVlad
).We all know the Divide and Conquer technic. To ensure
O(n log n)
using it, we usually focus on 2 points:O(n)
With these constraints in mind I have elaborated the following algorithm, which is reminiscent of
qsort
, and thus suffer the same pitfalls (namely, fractal inputs).Algorithm
red
that intersect withblue
, they are inserted in a HashSet to remove duplicates -->O(n)
O(n)
red
in the partitions, applying the Clipping technic, note that a givenred
might end up giving several chunks in different partitionsThe Pivot Choice is the corner stone of the algorithm, if the partition is ill-tailored we cannot achieve the required complexity. Also, it must be accomplished in
O(n)
. I have 2 proposals so far:Maximum Area
: use the rectangle with the greatest area, because it means that the partitions will have the smallest area afterward, however it suffers from being easy to trumpMedian of 3
: based on qsort, pick up 3 elements, selection the median (the one with the center closer to the barycenter of the 3 centers)I propose to mix them up thusly:
Another aspect of implementation is the Tail of the recursion. Like
qsort
it's probable that the algorithm is inefficient for smalln
. Instead of going all the way to 1, I propose to use theintrosort
trick: ifn
is smaller than say 12, then use the following algorithm instead:red
on the axis (only the edges) and sort them (ala introsort)Dimension agnostic
The algorithm is formally defined to be applicable in any given dimension with the same asymptotic complexity, in average O(n log n). This gives us the opportunity to test in dimension 1 to identify the pathological inputs.
Pathological input
Like
qsort
on which it is modelled it is sensible to factorial inputs. By factorial inputs I mean:whenever you pick the average of your interval, you have all the elements on one side of it.
With such an input even choosing the median of 3 is unlikely to yield a very good cut.
EDIT:
I am going to show the partition idea with a little scheme, as
@lVlad
noted it was kind of fuzzy.Okay, so the rectangle we check for coverage is now splitted into 9 subrectangles. We ignore P, it's our pivot.
Now, we would really like that each subrectangle is covered by less
red
thanN
. The pivot is chosen as a barycenter of the centers, thus it means if our "random" choice held true that there are about as manyred
s centers in each direction of the pivot.It's kind of fuzzy there because some special configurations might make it so that there is little gain at one step (all rectangles have the same center and we just picked the smaller one for example), but it will create chaos and thus the following step will be different.
I am happy if anyone can formalize that, I am an engineer, not a computer scientist, and my maths lag behind...
Do you know the usual worst-case
O(nlogn)
algorithm for the area of the union of rectangles?All you need to do here is to compute the two areas:
If these areas are equal, the model is totally covered, otherwise it isn't.
I've been thinking about it and I think I finally understood what
@algorithmist
meant by sweep line. However the very presence ofsorting
operations means that I have:O(n log n)
in averageO(n**2)
in the worst caseSweep Line
First of all, we need some sorting, because our
sweep line
will consist of walking an ordered set.This ordered set will feature the
top
andbottom
line of each of thered
s, as long as they are between thetop
andbottom
ofblue
. This divides our space into (at most)n*2
horizontal strips.Now, we need to make sure that in each
strip
, the whole ofblue
is covered, and this operation cannot have more thanO(log n)
complexity, this could be done if we had (for each strip) a list of the covered intervals. This is the 'event'@algorithmist
is speaking ofTo handle this event, we'll use a binary tree described below which handles adding or removing a rectangle in exactly
O(log n)
operations and yields the rightmost interval covered by the tree, which we use to tell if the strip ofblue
is covered or not.Binary Tree
First of all, I am going to index the
red
rectangles. We sort them using this function:I am going then to create a dedicated binary tree.
N
leaves, each representing ared
rectangle and indicating its presence or absence. They are ordered according to the index.Handling the bug "code block cannot follow list":
Now we have two possibilities, let's say the children cover
[a,b]
and[c,d]
:c <= b
, then the node hold[a,d]
[c,d]
Why does it works ? Let's take an example using 4 leaves:
The special node ignore
[3,5]
because it's not the rightmost interval. The reasoning is that the rectangles are ordered:[6,9]
will cover the missing[5,6]
interval since they begin after6
[3,5]
begin before3
, so if they cover the missing[5,6]
they'll cover[3,5]
anywayThe root may not indicate the exact set of intervals covered: only the rightmost interval covered. However, it's perfectly sufficient for us to tell if
blue
is completely covered or not!There are 2 operations available on this tree:
Each is similar:
The recursive bit takes
O(log n)
. It's a classic property of the balanced binary trees. And once it's done we have the rightmost interval covered by the root which is sufficient to tell whether or not theblue
segment is entirely covered or not.Complexity
The complexity of the algorithm is simple:
O(n)
eventsO(log n)
Which yields
O(n log n)
for the core part.However, we should not forget that we also have 2
sort
operations:Each shall take
O(n log n)
in average, but may degenerate intoO(n**2)
in the worst case, depending on the sorting algorithm used.Here's a way to do this without using rasterization, that is, using only pure numbers.
Note: This is not O(n log n), more like O(n^2). It is, however, a solution. Whether it is an answer, probably not if O(n log n) is a requirement.
The output should thus be:
Let me illustrate the process so far
Associated rectangles:
You now create an empty list,
L=[]
, and start processing the coordinates and associated rectangles:X=1
List is empty, nothing to process Nothing to remove Add A: L=[ A ]
X=2
List contains rectangles, process list as rectangles that have a left edge of X=1, and a right edge of X=2 (the two coordinates we've processed so far), and use their original top and bottom coordinates. Nothing to remove. Add B: L=[ A, B ]
X=3
List contains rectangles, process list (both A and B) the same way, treat them as temporarily having left and right coordinates as X=2 and X=3, and use their original top and bottom coordinates. Nothing to remove Add C: L=[ A, B, C ]
X=4
Process the three rectangles the same way as above, temporary left and right coordinates are X=3 and X=4 Remove B: L=[A, C ] Nothing to add
X=5 and X=6
Process these in the exact same manner.
This means you will end up with "strips" of rectangles, like this (I've pulled them a bit apart to clearer illustrate that they are strips, but they are located side-by-side continously like in the original diagram):
Ok, so now you have your output, a collection of coordinate-pairs, each pair having an associated list of rectangles.
Now we do a trick. We process the vertical strip in the exact same manner, only this time we use the Y coordinates as the delimiters. Let's handle the strip between 3 and 4 above. Remember that the strip has a left and right coordinates of 3 and 4.
Strip looks like this:
List of coordinates with rectangles:
New empty list L=[]
Process the coordinates:
Y=1
Nothing to process (L=[]) Add A to list, L=[ A ]
Y=2
Process A with temporarily having a top and bottom coordinates of Y=1 and 2 (and remember that it also has a temporary left and right coordinates of X=3 and 4 Add C, L=[ A, C ]
Y=3
Process A and C, both now having temporary coordinates of (3, 2)-(4, 3) Add B, L=[ A, B, C ]
Y=4
Process A, B and C, all having coordinates of (3, 3)-(4, 4) Remove C, L=[ A, B ]
Y=5
Process A and B, coordinates (3, 4)-(4, 5) Remove A, L=[ B ]
Y=6
Process B, coordinates (3, 5)-(4, 6)
Final output:
pairs of Y-coordinates, with rectangles associated with them (that also have temporarily got new X-coordinates):
Now, suppose you want to ask the question: Is B fully covered by all any combination of the other rectangles.
The answer can be worked out as follows:
In the above example, we see that the 3rd and 4rd rectangle in the final list contains B, but also contains other rectangles, hence those parts of B is covered, but the final rectangle in the list also contains B, but no other rectangle, hence this part is not covered.
According to the original diagram, the final result would include rectangles as follows (the letters inside each denote which original rectangle is associated with this new rectangle):
If we now take a new look at the original diagram, I have shaded out the parts that the above algorithm would find contains B, but no other rectangle:
The vertical bar in the middle there is to illustrate that the part would be returned as two rectangles, split at that location, due to the way the vertical strips were worked out.
I seriously hope I made myself understood here. I have some code that can help you with the implementation of each run through the lists of coordinates, but it's 01:21 past midnight here and I'm going to bed, but leave a comment if you wish to see some actual code for this.
Or it would be a great exercise to implement it yourself :)
Here's the link to the class containing the method in question: RangeExtensions.cs.
The method is the
Slice
method, just search the page for it. The type in question, Range, is basically a range from one value to another, so there's a bit of data construction and maintenance in addition to the above algorithm.Here's a generic algorithm
if yes you are done
if yes
if 2. or 3. is no then the answer is no, otherwise it is yes
Now the question is how to do the above efficiently. The above can be done in a single loop over all polygons, so I think you are looking at O(n) time.
If you need to create efficient algorithm that will test multiple models, or if you must optimize for fastest answer possible (at the expense of preparing the data) then you are looking for a structure that will allow quick answer to question if a rectangle intersects (or contains) a rectangle.
If you use, for example kd-trees, I believe that this can be answered in O(log n) time - but the important variable in this algorithm is also the number of found rectangles k. You will end up with something like O(k + log n) and you will also need to do the union part to test if the model is covered.
My guess is that the union could be computed in O(k log k), so if k is small then you are looking at O(log n) and if k is comparable to n then it should be O(k log k).
See also this question.
EDIT: In response to complexity of intersections and unions.
In more details, we have two scenarios depending on if k << n or k comparable to n
a) in case of k << n and assuming polynomial complexity for intersection/union (so here I give up the guess O(k log k) ) we have:
The total is O(k + log n + f(k)), which is directly equal to O(log n) since big O depends only on the fastest growing term.
In this case the more significant part of the algorithm is finding the polygons.
b) in the case of k comparable to n we must take a look at the complexity of intersection and union algorithms (notice the parallel here on how the RDBMs, depending on selectivity, might use index or do table scan; it is a similar choice to what we have here).
If k is big enough the algorithm becomes less of an algorithm for finding rectangles that intersect with the rectangle and becomes more of an algorithm for calculating the union of polygons.
But, i believe that the kd tree can also help in finding the intersection of segments (which are necessary to build the union), so even this part of algorithm might not need k^2 time. At this point I would investigate efficient algorithms for calculating the area of unions.
OK, I've asked enough questions, here's something of an answer ...
If the data is represented as a raster one algorithm is trivial:
If the data is vector it's a little more complicated. First define a function which returns the rectangle representing the intersection (if any) of two rectangles. This is simple. Then proceed:
Again, only bother with the Red rectangles which intersect the Blue one. For each Red rectangle, compute the intersection of the rectangle with the UnCoveredRectangle. The intersection will result in one of the following situations:
2.1 The intersection equals the UnCoveredRectangle. The job is finished.
2.2 The intersection 'bites' a rectangular chunk out of the CoveredRectangle. The remaining UnCoveredRectangle will be either a rectangle, an L-shaped piece, or a U-shaped piece. If it is a rectangle itself, set UnCoveredRectangle to be the remaining rectangle, and go to step 2. If the UnCoveredRectangle is L- or U-shaped, split it into 2, or 3, rectangles and recurse from step 2..
If you run out of Red rectangles before the area of (part of) UnCoveredRectangle is sent to 0, you've finished.
OK I haven't got a clue about the complexity of this algorithm, but unless the number of rectangles is huge, I'm not too concerned, though perhaps @den is. And I've omitted a lot of details. And I can't draw nice diagrams like @den did, so you'll have to picture it for yourselves.