Suppose we have a hollow square lamina of size n. That is, we have a nxn square from which a k*l rectangle has been removed (1<=k,l<=n-2). I want to calculate an average of distances between 2 random, uniformly distributed points within such hollow square lamina. For the sake of simplicity let's consider n=3, k=l=1, or a 3x3 square from whose center a unit square has been removed
I wrote this code for numpy, but it has at least 2 problems: I have to throw away approximately 1/9 of all generated points and removing the numpy.array elements requires lots of RAM:
x,y = 3*np.random.random((2,size,2))
x = x[
np.logical_not(np.logical_and(
np.logical_and(x[:,0] > 1, x[:,0] < 2),
np.logical_and(x[:,1] > 1, x[:,1] < 2)
))
]
y = y[
np.logical_not(np.logical_and(
np.logical_and(y[:,0] > 1, y[:,0] < 2),
np.logical_and(y[:,1] > 1, y[:,1] < 2)
))
]
n = min(x.shape[0], y.shape[0])
UPD:Here size
is the sample size of which I'm going to calculate average.
Is there an elegant way to generate those points right away, without removing the unfit ones?
UPD: Here is the full code just for the reference:
def calc_avg_dist(size):
x,y = 3*np.random.random((2,size,2))
x = x[
np.logical_not(np.logical_and(
np.logical_and(x[:,0] > 1, x[:,0] < 2),
np.logical_and(x[:,1] > 1, x[:,1] < 2)
))
]
y = y[
np.logical_not(np.logical_and(
np.logical_and(y[:,0] > 1, y[:,0] < 2),
np.logical_and(y[:,1] > 1, y[:,1] < 2)
))
]
n = min(x.shape[0], y.shape[0])
diffs = x[:n,:] - y[:n,:]
return np.sum(np.sqrt(np.einsum('ij,ij->i',diffs,diffs)))/n
I've already posted a shorter and somehow unclear answer, here I've taken the time to produce what I think is a better answer.
Generalizing the OP problem, we have a "surface" composed of
nsc * nsr
squares disposed innsc
columns andnsr
rows, and a "hole" composed ofnhc * nhr
squares (corresponding to the surface squares) disposed in a rectangular arrangement withnhr
rows andnhc
columns, with the origin placed atohc, ohr
Our aim is to draw
n
random points from the surface without the hole (the admissible surface) with a uniform distribution over the admissible surface.We observe that the admissible area is composed of
nsq = nsc*nsr -nhc*nhr
equal squares: if we place a random point in an abstract unit square and then assign, with equal probability, that point to one of the squares of the admissible area, we have done our job.In pseudocode, if
random()
samples from an uniformly distributed random variable over[0, 1)
To speed up the process, we are using
numpy
and we try to avoid, as far as possible, explicitfor
loops.With the names previously used, we need a listing of the origins of the admissible squares, to implement the last line of the pseudo code.
We need to generate
n
random points in the unit square, organized in an array of shape(n,2)
We need an array of
n
admissible squaresWe translate each point in
rand_points
in one of the admissible squares, as specified by the elements ofplacements
.taking advantage of the extended addressing that it is possible for
numpy
arrays, and its done...In an overly compact function
You have 8 equal unit squares in which it is admissible to place points, so draw as many points in an unit square as you want as in
now it suffices to choose at random in which of the 8 admissible squares each point is placed
you need also an array of origins
and finally
To generalize the solution, write a function to compute an array of the origins of the admissible squares, a possible implementation using sets being
and use it like this
With the center removed, there are 8 regions that should contain points. These are their lower-left corners:
The regions are 1x1, so they have the same area and are equally likely to contain a given random point. The following chooses
size
lower-left corners:Now generate
size
(x,y) coordinates in the unit square:Add those to the lower-left corners chosen previously:
pts
has shape(size, 2)
. Here's a plot, withsize = 2000
:Update to address the updated question...
The following function generalizes the above idea to a rectangular shape containing a rectangular hollow. The rectangle is still considered to be nine regions, with the middle region being the hollow. The probability of a random point being in a region is determined by the area of the region;
numpy.random.multinomial
is used to select the number of points in each region.(I'm sure there is room for optimization of this code.)
For example,
Note that the arguments do not have to be integers: