可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This is an interview question, the interview has been done.
Given a deck of rectangular cards, put them randomly on a rectangular table whose size is much larger than the total sum of cards' size. Some cards may overlap with each other randomly. Design an algorithm that can calculate the area the table covered by all cards and also analyze the time complexity of the algorithm. All coordinates of each vertex of all cards are known. The cards can overlap in any patterns.
My idea:
Sort the cards by its vertical coordinate descending order.
Scan the cards vertically from top to bottom after reaching an edge or vertices of a card, go on scanning with another scan line until it reached another edge, and find the area located between the two lines . Finally, sum all area located between two lines and get the result.
But, how to compute the area located between two lines is a problem if the area is irregular.
Any help is appreciated. thanks !
回答1:
This could be done easily using the union-intersection formula (size of A union B union C = A + B + C - AB - AC - BC + ABC, etc), but that would result in an O(n!)
algorithm. There is another, more complicated way that results in O(n^2 (log n)^2)
.
Store each card as a polygon + its area in a list. Compare each polygon in the list to every other polygon. If they intersect, remove them both from the list, and add their union to the list. Continue until no polygons intersect. Sum their areas to find the total area.
The polygons can be concave and have holes, so computing their intersection is not easy. However, there are algorithms (and libraries) available to compute it in O(k log k)
, where k
is the number of vertices. Since the number of vertices can be on the order of n
, this means computing the intersection is O(n log n)
.
Comparing every polygon to every other polygon is O(n^2)
. However, we can use an O(n log n)
sweeping algorithm to find nearest polygons instead, making the overall algorithm O((n log n)^2) = O(n^2 (log n)^2)
.
回答2:
This is almost certainly not what your interviewers were looking for, but I'd've proposed it just to see what they said in response:
I'm assuming that all cards are the same size and are strictly rectangular with no holes, but that they are placed randomly in an X,Y sense and also oriented randomly in a theta sense. Therefore, each card is characterized by a triple (x,y,theta) or of course you also have your quad of corner locations. With this information, we can do a monte carlo analysis fairly simply.
Simply generate a number of points at random on the surface of the table, and determine, by using the list, whether or not each point is covered by any card. If yes, keep it; if not, throw it out. Calculate the area of the cards by the ratio of kept points to total points.
Obviously, you can test each point in O(n) where n is the number of cards. However, there is a slick little technique that I think applies here, and I think will speed things up. You can grid out your table top with an appropriate grid size (related to the size of the cards) and pre-process the cards to figure out which grids they could possibly be in. (You can over-estimate by pre-processing the cards as though they were circular disks with a diameter going between opposite corners.) Now build up a hash table with the keys as grid locations and the contents of each being any possible card that could possibly overlap that grid. (Cards will appear in multiple grids.)
Now every time you need to include or exclude a point, you don't need to check each card, but only the pre-processed cards that could possibly be in your point's grid location.
There's a lot to be said for this method:
- You can pretty easily change it up to work with non-rectangular cards, esp if they're convex
- You can probably change it up to work with differently sized or shaped cards, if you have to (and in that case, the geometry really gets annoying)
- If you're interviewing at a place that does scientific or engineering work, they'll love it
- It parallelizes really well
- It's so cool!!
On the other hand:
- It's an approximation technique (but you can run to any precision you like!)
- You're in the land of expected runtimes, not deterministic runtimes
- Someone might actually ask you detailed questions about Monte Carlo
- If they're not familiar with Monte Carlo, they might think you're making stuff up
I wish I could take credit for this idea, but alas, I picked it up from a paper calculating surface areas of proteins based on the position and sizes of the atoms in the proteins. (Same basic idea, except now we had a 3D grid in 3-space, and the cards really were disks. We'd go through and for each atom, generate a bunch of points on its surface and see if they were or were not interior to any other atoms.)
EDIT: It occurs to me that the original problem stipulates that the total table area is much larger than the total card area. In this case, an appropriate grid size means that a majority of the grids must be unoccupied. You can also pre-process grid locations, once your hash table is built up, and eliminate those entirely, only generating points inside possibly occupied grid locations. (Basically, perform individual MC estimates on each potentially occluded grid location.)
回答3:
Here's an idea that is not perfect but is practically useful. You design an algorithm that depends on an accuracy measure epsilon (eps). Imagine you split the space into squares of size eps x eps. Now you want to count the number of squares lying inside the cards. Let the number of cards be n and let the sides of the cards be h and w.
Here is a naive way to do it:
S = {} // Hashset
for every card:
for x in [min x value of card, max x value of card] step eps:
for y in [min y value of card, max y value of card] step eps:
if (x, y) is in the card:
S.add((x, y))
return size(S) * eps * eps
The algorithm runs in O(n * (S/eps)^2) and the error is strongly bounded by (2 * S * n * eps), therefore the relative error is at most (2 * eps * n / S).
So for example, to guarantee an error of less than 1%, you have to choose eps less than S / (200 n) and the algorithm runs in about 200^2 * n^3 steps.
回答4:
Suppose there are n cards of unit area. Let T be the area of the table. For the discretised problem, the expected area covered will be
$ T(1-({{T-1}\over{T}})^n) $
回答5:
T = The total area of the table.
C = The total area that could be covered by cards (area of one card times number of cards).
V = The total area of overlapping cards (V = oVerlap)
Area to calculate = T - (C - V)
There should be (yep, those are danger words) some way to efficiently analyze the space occupied by the cards, to readily identify overlapping vs. non-overlapping situations. Identify these, factor out all overlapped areas, and you're done.
Time complexity would be in considering each card in order, one by one, and comparing each with each remaining card (card 2 has already been checked against card 1), which makes it n!, not good... but this is where the "should" comes in. There must be some efficient way to remove all cards that do not overlap from consideration, to order cards to make it obvious if they could not possibly overlap other/prior cards, and perhaps to identify or group potentially overlapping cards.
Interesting problem.