I have a large 2 dimensional grid, let us say 10000 X 10000. From these grid I need to select 1000 random points but I also need to take care that none of the two points are the same. The standard way that comes to my mind is after selecting every point I should check all the previous entries to see if that point has already been selected or not but it seems for large grids and large number of points this will become inefficient. Is there a better way to do it? I am using C++
问题:
回答1:
Randomly selecting any point and then discarding it if it exists in the Selected Points list should not be inefficient, so long as you have well sorted collection of Selected Points, that you can also easily insert into.
Also, depending on how your points are defined (i.e. are they each associated with a class or struct that you've defined), you could add a boolean variable to the point object, named Selected
. Once you select a point, check to see if it has been marked as Selected
. If not, add it to your list and change the Selected
value to TRUE
. Otherwise, continue on with your selection of random points.
回答2:
it seems for large grids and large number of points this will become inefficient
Not necessarily. There are two potential sources of inefficiency:
- Overhead caused by rejection sampling (that is, having to keep trying until you've found a not-yet-selected point). Given that you're choosing 0.001% of the points, the chances of randomly selecting the same point twice are very small. Therefore, the cost of re-trying should be negligible.
- Overhead of checking whether the randomly chosen point has already been selected. If you store all previously selected points in a suitable data structure, this can be done in
O(1)
time. For this,std::unordered_set
would be a good candidate. The size of the set will grow linearly in the number of elements you need to select, and will be completely independent of the grid size.
回答3:
You could implement an algorithm like this:
- Create an empty mapping from hashes to points
- select random point
- calculate hash
- if hash in mapping, goto 1
- save hash & point
- if not enough points yet, goto 1