Rectangles Covering

2019-01-22 13:09发布

I have N rectangles with sides parallel to the x- and y-axes. There is another rectangle, model. I need to create an algorithm that can tell whether the model is completely covered by the N rectangles.

I have some ideas. I think that first, I need to sort the rectangles by their left side (it can be done in O(n log n) time), and then use a vertical sweeping line.

+------------------------------------------------------------> x
|O
|                  +----+
|   +---------+    |    |
|   |        ++----+--+ |
|   |      +-++----+-+| |
|   |      | |     +-++-+
|   +------+ +-------++
|          +---------+
|
|
|
|y

The blue rectangle is the model.

First of all, I need the abstract algorithm. There are no special requirements with regard to the realization. A rectangle can be represented as a pair of points (left-top and bottom-right).

This is one of the tasks for preparing for a test. I know that the best algorithm can do this in O(n log n) time.

12条回答
不美不萌又怎样
2楼-- · 2019-01-22 13:22

Okay, now it seems I can't even sleep at night because I think about this problem... but it also seems I finally got an O(n log n) solution, in average case (but with reduced chances of having a pathological input compared to @lVlad).

We all know the Divide and Conquer technic. To ensure O(n log n) using it, we usually focus on 2 points:

  • the divide and merge should be O(n)
  • there exist C > 1 such that at each step the size of the subproblems is reduced by a factor C (constant throughout the problem)

With these constraints in mind I have elaborated the following algorithm, which is reminiscent of qsort, and thus suffer the same pitfalls (namely, fractal inputs).

Algorithm

  1. Clipping: we only consider the portion of a red that intersect with blue, they are inserted in a HashSet to remove duplicates --> O(n)
  2. Pivot Selection: more on this later --> O(n)
  3. Partition: once we have a pivot, we subdivise the space in 3d zones, one of which being the pivot, with d being the dimension (d = 1 for segments, 2 for rectangles, 3 for cubes etc...)
  4. Repartition: we put the red in the partitions, applying the Clipping technic, note that a given red might end up giving several chunks in different partitions
  5. Recursion: we apply recursively (from step 2) on each partition, beginning by the least populated one and stopping as soon as one is not covered

The Pivot Choice is the corner stone of the algorithm, if the partition is ill-tailored we cannot achieve the required complexity. Also, it must be accomplished in O(n). I have 2 proposals so far:

  • Maximum Area: use the rectangle with the greatest area, because it means that the partitions will have the smallest area afterward, however it suffers from being easy to trump
  • Median of 3: based on qsort, pick up 3 elements, selection the median (the one with the center closer to the barycenter of the 3 centers)

I propose to mix them up thusly:

  • Pick up the 3 elements with the greatest area, select the median, use it at pivot
  • If after the repartition it turns out that one of the partition is populated with more than N/C (to be customized) elements, pick up 3 elements at random, select the median, and do the repartition (no check here)

Another aspect of implementation is the Tail of the recursion. Like qsort it's probable that the algorithm is inefficient for small n. Instead of going all the way to 1, I propose to use the introsort trick: if n is smaller than say 12, then use the following algorithm instead:

  • For each axis, project each red on the axis (only the edges) and sort them (ala introsort)
  • This defines a raster of nd zones, check that each is covered

Dimension agnostic

The algorithm is formally defined to be applicable in any given dimension with the same asymptotic complexity, in average O(n log n). This gives us the opportunity to test in dimension 1 to identify the pathological inputs.

Pathological input

Like qsort on which it is modelled it is sensible to factorial inputs. By factorial inputs I mean:

1.......6...9.11.13

whenever you pick the average of your interval, you have all the elements on one side of it.

With such an input even choosing the median of 3 is unlikely to yield a very good cut.

EDIT:

I am going to show the partition idea with a little scheme, as @lVlad noted it was kind of fuzzy.

+----------------+----+---------+
|       1        | 2  |   3     |
+----------------+----+---------+
|       8        | P  |   4     |
+----------------+----+---------+
|       7        | 6  |   5     |
+----------------+----+---------+

Okay, so the rectangle we check for coverage is now splitted into 9 subrectangles. We ignore P, it's our pivot.

Now, we would really like that each subrectangle is covered by less red than N. The pivot is chosen as a barycenter of the centers, thus it means if our "random" choice held true that there are about as many reds centers in each direction of the pivot.

It's kind of fuzzy there because some special configurations might make it so that there is little gain at one step (all rectangles have the same center and we just picked the smaller one for example), but it will create chaos and thus the following step will be different.

I am happy if anyone can formalize that, I am an engineer, not a computer scientist, and my maths lag behind...

查看更多
Luminary・发光体
3楼-- · 2019-01-22 13:22

Do you know the usual worst-case O(nlogn) algorithm for the area of the union of rectangles?

All you need to do here is to compute the two areas:

  1. The area of your N rectangles
  2. The area of your N rectangles and the model

If these areas are equal, the model is totally covered, otherwise it isn't.

查看更多
Fickle 薄情
4楼-- · 2019-01-22 13:26

I've been thinking about it and I think I finally understood what @algorithmist meant by sweep line. However the very presence of sorting operations means that I have:

  • O(n log n) in average
  • O(n**2) in the worst case

Sweep Line

First of all, we need some sorting, because our sweep line will consist of walking an ordered set.

This ordered set will feature the top and bottom line of each of the reds, as long as they are between the top and bottom of blue. This divides our space into (at most) n*2 horizontal strips.

Now, we need to make sure that in each strip, the whole of blue is covered, and this operation cannot have more than O(log n) complexity, this could be done if we had (for each strip) a list of the covered intervals. This is the 'event' @algorithmist is speaking of

To handle this event, we'll use a binary tree described below which handles adding or removing a rectangle in exactly O(log n) operations and yields the rightmost interval covered by the tree, which we use to tell if the strip of blue is covered or not.

Binary Tree

First of all, I am going to index the red rectangles. We sort them using this function:

def __lt__(lhs, rhs):
  return (lhs.left <  rhs.left)
      or (lhs.left == rhs.left and lhs.right < rhs.right)

I am going then to create a dedicated binary tree.

  • It will have N leaves, each representing a red rectangle and indicating its presence or absence. They are ordered according to the index.
  • Each intermediary node will have for value the rightmost interval covered by its children

Handling the bug "code block cannot follow list":

class Node:
  def __init__(self):
    self.interval = []
    self.left  = None
    self.right = None

Now we have two possibilities, let's say the children cover [a,b] and [c,d]:

  • if c <= b, then the node hold [a,d]
  • else it holds [c,d]

Why does it works ? Let's take an example using 4 leaves:

        _ [1,9] _
       /         \
   [1,7]         [6,9] <-- Special node merge
   /   \         /   \
  /     \       /     \
[1,3] [2,7]   [3,5] [6,9]

The special node ignore [3,5] because it's not the rightmost interval. The reasoning is that the rectangles are ordered:

  • No rectangle on the right of [6,9] will cover the missing [5,6] interval since they begin after 6
  • The rectangles on the left of [3,5] begin before 3, so if they cover the missing [5,6] they'll cover [3,5] anyway

The root may not indicate the exact set of intervals covered: only the rightmost interval covered. However, it's perfectly sufficient for us to tell if blue is completely covered or not!

There are 2 operations available on this tree:

  • Marking a rectangle as present
  • Marking a rectangle as absent

Each is similar:

  • if the rectangle was already in this state, do nothing
  • else, toggle its state, then update its parent interval (recursively, up to the root)

The recursive bit takes O(log n). It's a classic property of the balanced binary trees. And once it's done we have the rightmost interval covered by the root which is sufficient to tell whether or not the blue segment is entirely covered or not.

Complexity

The complexity of the algorithm is simple:

  • We have O(n) events
  • Each event is handled in O(log n)

Which yields O(n log n) for the core part.

However, we should not forget that we also have 2 sort operations:

  • one to classify the events (for the sweep line)
  • the other to classify the rectangles (for the binary tree)

Each shall take O(n log n) in average, but may degenerate into O(n**2) in the worst case, depending on the sorting algorithm used.

查看更多
霸刀☆藐视天下
5楼-- · 2019-01-22 13:26

Here's a way to do this without using rasterization, that is, using only pure numbers.

Note: This is not O(n log n), more like O(n^2). It is, however, a solution. Whether it is an answer, probably not if O(n log n) is a requirement.

  1. Create a list of all the unique X-coordinates of the left and right edges (in the same list)
  2. When you build this list, for each coordinate, also associate a list of rectangles with it, and denote in this list whether the X-coordinate the list is associated with is the left or the right edge of the rectangle
  3. Sort the list of coordinates so that it is ascending, going from left to right
  4. Create a new list of rectangles, initially empty
  5. Run through the list of coordinates, and process the associated list of rectangles for it
  6. All rectangles in the associated list that are denoted as having the coordinate as the left edge should be added to the list you created in 4.
  7. All rectangles with the coordinate as the right edge should be removed.
  8. The order of adding and removing should actually be done in the opposite order, first you want to remove the right-edge-rectangles, and then you add all the left-edge-rectangles
  9. Now, before you remove the rectangles from the list you created in 4, you want to process them. You process them by treating them as "sub-rectangles". Their "new" left edge coordinate is the X-coordinate you processed the previous iteration of the loop (in 5), and the new right edge is the current X-coordinate being processed
  10. The output of the algorithm is a collection of coordinate-pairs (the two X-coordinates left and right), each pair having an associated list of rectangles (the vertical strip)

The output should thus be:

  1. X=1..2, Rects=A
  2. X=2..3, Rects=A, B
  3. X=3..4, Rects=A, B, C
  4. X=4..5, Rects=A, C
  5. X=5..6, Rects=C

Let me illustrate the process so far

+-------------------+
|A                  |
|        +----------+-----+
|        |C         |     |
|  +-----+----+     |     |
|  |B    |    |     |     |
|  |     +----+-----+-----+
|  |          |     |
+--+----------+-----+
   |          |
   +----------+

^  ^     ^    ^     ^     ^
1  2     3    4     5     6  <-- X-coordinates

Associated rectangles:

  1. left: A, right: (none)
  2. left: B, right: (none)
  3. left: C, right: (none)
  4. left: (none), right: B
  5. left: (none), right: A
  6. left: (none), right: C

You now create an empty list, L=[], and start processing the coordinates and associated rectangles:

X=1

List is empty, nothing to process Nothing to remove Add A: L=[ A ]

X=2

List contains rectangles, process list as rectangles that have a left edge of X=1, and a right edge of X=2 (the two coordinates we've processed so far), and use their original top and bottom coordinates. Nothing to remove. Add B: L=[ A, B ]

X=3

List contains rectangles, process list (both A and B) the same way, treat them as temporarily having left and right coordinates as X=2 and X=3, and use their original top and bottom coordinates. Nothing to remove Add C: L=[ A, B, C ]

X=4

Process the three rectangles the same way as above, temporary left and right coordinates are X=3 and X=4 Remove B: L=[A, C ] Nothing to add

X=5 and X=6

Process these in the exact same manner.

This means you will end up with "strips" of rectangles, like this (I've pulled them a bit apart to clearer illustrate that they are strips, but they are located side-by-side continously like in the original diagram):

+--+ +-----+ +----+ ------+
|A | |     | |    | |     |
|  | |     | +----+ +-----+ +-----+
|  | |     | |C   | |     | |     |
|  | +-----| +----+ |     | |     |
|  | |B    | |    | |     | |     |
|  | |     | +----+ +-----| +-----+
|  | |     | |    | |     |
+--+ +-----+ +----+ +-----+
     |     | |    |
     +-----+ +----+
^  ^ ^     ^ ^    ^ ^     ^ ^     ^
1  2 2     3 3    4 4     5 5     6

Ok, so now you have your output, a collection of coordinate-pairs, each pair having an associated list of rectangles.

Now we do a trick. We process the vertical strip in the exact same manner, only this time we use the Y coordinates as the delimiters. Let's handle the strip between 3 and 4 above. Remember that the strip has a left and right coordinates of 3 and 4.

Strip looks like this:

^     +----+ <-- 1
A     |    |
|   ^ +----+ <-- 2
|   C |    |
| ^ | +----+ <-- 3
| B | |    |
| | V +----+ <-- 4
| |   |    |
V |   +----+ <-- 5
  |   |    |
  V   +----+ <-- 6

List of coordinates with rectangles:

  1. Top=A, Bottom=(none)
  2. Top=C, Bottom=(none)
  3. Top=B, Bottom=(none)
  4. Top=(none), Bottom=C
  5. Top=(none), Bottom=A
  6. Top=(none), Bottom=B

New empty list L=[]

Process the coordinates:

Y=1

Nothing to process (L=[]) Add A to list, L=[ A ]

Y=2

Process A with temporarily having a top and bottom coordinates of Y=1 and 2 (and remember that it also has a temporary left and right coordinates of X=3 and 4 Add C, L=[ A, C ]

Y=3

Process A and C, both now having temporary coordinates of (3, 2)-(4, 3) Add B, L=[ A, B, C ]

Y=4

Process A, B and C, all having coordinates of (3, 3)-(4, 4) Remove C, L=[ A, B ]

Y=5

Process A and B, coordinates (3, 4)-(4, 5) Remove A, L=[ B ]

Y=6

Process B, coordinates (3, 5)-(4, 6)

Final output:

pairs of Y-coordinates, with rectangles associated with them (that also have temporarily got new X-coordinates):

  • (3, 1)-(4, 2) - A
  • (3, 2)-(4, 3) - A, C
  • (3, 3)-(4, 4) - A, B, C
  • (3, 4)-(4, 5) - A, B
  • (3, 5)-(4, 6) - B

Now, suppose you want to ask the question: Is B fully covered by all any combination of the other rectangles.

The answer can be worked out as follows:

  1. Loop through all the rectangles in the final list above
  2. If B is NOT part of the associated rectangle, ignore this rectangle
  3. If there is any other of the original rectangles associated with the coordinates, then this part of B is covered by that/those rectangle(s)
  4. If there is no other of the original rectangles associated with the coordinates, then this part of B is not covered.

In the above example, we see that the 3rd and 4rd rectangle in the final list contains B, but also contains other rectangles, hence those parts of B is covered, but the final rectangle in the list also contains B, but no other rectangle, hence this part is not covered.

According to the original diagram, the final result would include rectangles as follows (the letters inside each denote which original rectangle is associated with this new rectangle):

+--+-----+----+-----+
|A |A    |A   |A    |
|  |     +----+-----+-----+
|  |     |AC  |AC   |C    |
|  +-----+----+     |     |
|  |AB   |ABC |     |     |
|  |     +----+-----+-----+
|  |     |AB  |A    |
+--+-----+----+-----+
   |B    |B   |
   +-----+----+

If we now take a new look at the original diagram, I have shaded out the parts that the above algorithm would find contains B, but no other rectangle:

+-------------------+
|A                  |
|        +----------+-----+
|        |C         |     |
|  +-----+----+     |     |
|  |B    |    |     |     |
|  |     +----+-----+-----+
|  |          |     |
+--+-----+----+-----+
   |#####|####|
   +-----+----+

The vertical bar in the middle there is to illustrate that the part would be returned as two rectangles, split at that location, due to the way the vertical strips were worked out.

I seriously hope I made myself understood here. I have some code that can help you with the implementation of each run through the lists of coordinates, but it's 01:21 past midnight here and I'm going to bed, but leave a comment if you wish to see some actual code for this.

Or it would be a great exercise to implement it yourself :)

Here's the link to the class containing the method in question: RangeExtensions.cs.

The method is the Slice method, just search the page for it. The type in question, Range, is basically a range from one value to another, so there's a bit of data construction and maintenance in addition to the above algorithm.

查看更多
Bombasti
6楼-- · 2019-01-22 13:32

Here's a generic algorithm

  1. is there any rectangle covering the whole model?
    if yes you are done
  2. if no are there any rectangles partially covering the model?
    if yes
  3. is the union of all the intersections of rectangle and model equal to the model
    if 2. or 3. is no then the answer is no, otherwise it is yes

Now the question is how to do the above efficiently. The above can be done in a single loop over all polygons, so I think you are looking at O(n) time.

If you need to create efficient algorithm that will test multiple models, or if you must optimize for fastest answer possible (at the expense of preparing the data) then you are looking for a structure that will allow quick answer to question if a rectangle intersects (or contains) a rectangle.

If you use, for example kd-trees, I believe that this can be answered in O(log n) time - but the important variable in this algorithm is also the number of found rectangles k. You will end up with something like O(k + log n) and you will also need to do the union part to test if the model is covered.

My guess is that the union could be computed in O(k log k), so if k is small then you are looking at O(log n) and if k is comparable to n then it should be O(k log k).

See also this question.

EDIT: In response to complexity of intersections and unions.

In more details, we have two scenarios depending on if k << n or k comparable to n

a) in case of k << n and assuming polynomial complexity for intersection/union (so here I give up the guess O(k log k) ) we have:

  • log n to find a range in kd indexed tree (of rectangles),
  • k steps to retrieve all the rectangles in that 'branch',
  • f(k) some polynomial time to calculate intersections and union

The total is O(k + log n + f(k)), which is directly equal to O(log n) since big O depends only on the fastest growing term.

In this case the more significant part of the algorithm is finding the polygons.

b) in the case of k comparable to n we must take a look at the complexity of intersection and union algorithms (notice the parallel here on how the RDBMs, depending on selectivity, might use index or do table scan; it is a similar choice to what we have here).
If k is big enough the algorithm becomes less of an algorithm for finding rectangles that intersect with the rectangle and becomes more of an algorithm for calculating the union of polygons.

But, i believe that the kd tree can also help in finding the intersection of segments (which are necessary to build the union), so even this part of algorithm might not need k^2 time. At this point I would investigate efficient algorithms for calculating the area of unions.

查看更多
地球回转人心会变
7楼-- · 2019-01-22 13:33

OK, I've asked enough questions, here's something of an answer ...

If the data is represented as a raster one algorithm is trivial:

  1. Create a boolean array representing the model (ie Blue) rectangle. Set all elements to FALSE (representing 'not covered')
  2. For each Red rectangle (ignore the ones which can't overlap the Blue) set all elements of the array to TRUE if they are inside the Red rectangle.
  3. Finally, check whether or not every element of the array has been set TRUE.

If the data is vector it's a little more complicated. First define a function which returns the rectangle representing the intersection (if any) of two rectangles. This is simple. Then proceed:

  1. Create a variable called 'UnCoveredRectangle' which is initialised to be the same as the Blue rectangle.
  2. Again, only bother with the Red rectangles which intersect the Blue one. For each Red rectangle, compute the intersection of the rectangle with the UnCoveredRectangle. The intersection will result in one of the following situations:

    2.1 The intersection equals the UnCoveredRectangle. The job is finished.

    2.2 The intersection 'bites' a rectangular chunk out of the CoveredRectangle. The remaining UnCoveredRectangle will be either a rectangle, an L-shaped piece, or a U-shaped piece. If it is a rectangle itself, set UnCoveredRectangle to be the remaining rectangle, and go to step 2. If the UnCoveredRectangle is L- or U-shaped, split it into 2, or 3, rectangles and recurse from step 2..

  3. If you run out of Red rectangles before the area of (part of) UnCoveredRectangle is sent to 0, you've finished.

OK I haven't got a clue about the complexity of this algorithm, but unless the number of rectangles is huge, I'm not too concerned, though perhaps @den is. And I've omitted a lot of details. And I can't draw nice diagrams like @den did, so you'll have to picture it for yourselves.

查看更多
登录 后发表回答