Pythonic way to generate random uniformly distribu

2019-06-25 12:23发布

站内文章 / 前沿技术

21 0

走好不送

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Suppose we have a hollow square lamina of size n. That is, we have a nxn square from which a k*l rectangle has been removed (1<=k,l<=n-2). I want to calculate an average of distances between 2 random, uniformly distributed points within such hollow square lamina. For the sake of simplicity let's consider n=3, k=l=1, or a 3x3 square from whose center a unit square has been removed

I wrote this code for numpy, but it has at least 2 problems: I have to throw away approximately 1/9 of all generated points and removing the numpy.array elements requires lots of RAM:

    x,y = 3*np.random.random((2,size,2))
x = x[
    np.logical_not(np.logical_and(
        np.logical_and(x[:,0] > 1, x[:,0] < 2),
        np.logical_and(x[:,1] > 1, x[:,1] < 2)
        ))
    ]
y = y[
    np.logical_not(np.logical_and(
        np.logical_and(y[:,0] > 1, y[:,0] < 2),
        np.logical_and(y[:,1] > 1, y[:,1] < 2)
        ))
    ]
n = min(x.shape[0], y.shape[0])

UPD:Here size is the sample size of which I'm going to calculate average. Is there an elegant way to generate those points right away, without removing the unfit ones?

UPD: Here is the full code just for the reference:

def calc_avg_dist(size):
    x,y = 3*np.random.random((2,size,2))
    x = x[
        np.logical_not(np.logical_and(
        np.logical_and(x[:,0] > 1, x[:,0] < 2),
        np.logical_and(x[:,1] > 1, x[:,1] < 2)
        ))
    ]
    y = y[
        np.logical_not(np.logical_and(
        np.logical_and(y[:,0] > 1, y[:,0] < 2),
        np.logical_and(y[:,1] > 1, y[:,1] < 2)
        ))
    ]
    n = min(x.shape[0], y.shape[0])
    diffs = x[:n,:] - y[:n,:]
    return np.sum(np.sqrt(np.einsum('ij,ij->i',diffs,diffs)))/n

回答1:

With the center removed, there are 8 regions that should contain points. These are their lower-left corners:

In [350]: llcorners = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [2, 1], [0, 2], [1, 2], [2, 2]])

The regions are 1x1, so they have the same area and are equally likely to contain a given random point. The following chooses size lower-left corners:

In [351]: corner_indices = np.random.choice(len(llcorners), size=size)

Now generate size (x,y) coordinates in the unit square:

In [352]: unit_coords = np.random.random(size=(size, 2))

Add those to the lower-left corners chosen previously:

In [353]: pts = unit_coords + llcorners[corner_indices]

pts has shape (size, 2). Here's a plot, with size = 2000:

In [363]: plot(pts[:,0], pts[:,1], 'o')
Out[363]: [<matplotlib.lines.Line2D at 0x11000f950>]

Update to address the updated question...

The following function generalizes the above idea to a rectangular shape containing a rectangular hollow. The rectangle is still considered to be nine regions, with the middle region being the hollow. The probability of a random point being in a region is determined by the area of the region; numpy.random.multinomial is used to select the number of points in each region.

(I'm sure there is room for optimization of this code.)

from __future__ import division

import numpy as np


def sample_hollow_lamina(size, outer_width, outer_height, a, b, inner_width, inner_height):
    """
    (a, b) is the lower-left corner of the "hollow".
    """
    llcorners = np.array([[0, 0], [a, 0], [a+inner_width, 0],
                          [0, b], [a+inner_width, b],
                          [0, b+inner_height], [a, b+inner_height], [a+inner_width, b+inner_height]])
    top_height = outer_height - (b + inner_height)
    right_width = outer_width - (a + inner_width)
    widths = np.array([a, inner_width, right_width, a, right_width, a, inner_width, right_width])
    heights = np.array([b, b, b, inner_height, inner_height, top_height, top_height, top_height])
    areas = widths * heights
    shapes = np.column_stack((widths, heights))

    regions = np.random.multinomial(size, areas/areas.sum())
    indices = np.repeat(range(8), regions)
    unit_coords = np.random.random(size=(size, 2))
    pts = unit_coords * shapes[indices] + llcorners[indices]

    return pts

For example,

In [455]: pts = sample_hollow_lamina(2000, 5, 5, 1, 1, 2, 3)

In [456]: plot(pts[:,0], pts[:,1], 'o', alpha=0.75)
Out[456]: [<matplotlib.lines.Line2D at 0x116da0a50>]

In [457]: grid()

Note that the arguments do not have to be integers:

In [465]: pts = sample_hollow_lamina(2000, 3, 3, 0.5, 1.0, 1.5, 0.5)

In [466]: plot(pts[:,0], pts[:,1], 'o', alpha=0.75)
Out[466]: [<matplotlib.lines.Line2D at 0x116e60390>]

In [467]: grid()

回答2:

I've already posted a shorter and somehow unclear answer, here I've taken the time to produce what I think is a better answer.

Generalizing the OP problem, we have a "surface" composed of nsc * nsr squares disposed in nsc columns and nsr rows, and a "hole" composed of nhc * nhr squares (corresponding to the surface squares) disposed in a rectangular arrangement with nhr rows and nhc columns, with the origin placed at ohc, ohr

    +------+------+------+------+------+------+------+------+------+ nsr  
    |      |      |      |      |      |      |      |      |      |  
    |      |      |      |      |      |      |      |      |      |  
 ...+------+------+------+------+------+------+------+------+------+ nsr-1  
    |      |      | oooo | oooo |      |      |      |      |      |  
    |      |      | oooo | oooo |      |      |      |      |      |  
    +------+------+------+------+------+------+------+------+------+ ...  
    |      |      | oooo | oooo |      |      |      |      |      |  
    |      |      | oooo | oooo |      |      |      |      |      |  
    +------+------+------+------+------+------+------+------+------+ ...  
    |      |      | oooo | oooo |      |      |      |      |      |  
    |      |      | oooo | oooo |      |      |      |      |      |  
 ohc+------+------+------+------+------+------+------+------+------+ 1  
    |      |      |      |      |      |      |      |      |      |  
    |      |      |      |      |      |      |      |      |      |  
    +------+------+------+------+------+------+------+------+------+ 0  
    0      1      2     ...                         ...   nsc-1   nsc  
                  |             |  
                ohr=2        ohr+nhr

Our aim is to draw n random points from the surface without the hole (the admissible surface) with a uniform distribution over the admissible surface.

We observe that the admissible area is composed of nsq = nsc*nsr -nhc*nhr equal squares: if we place a random point in an abstract unit square and then assign, with equal probability, that point to one of the squares of the admissible area, we have done our job.

In pseudocode, if random() samples from an uniformly distributed random variable over [0, 1)

(x, y) = (random(), random())
square = integer(random()*nsq)
(x, y) = (x, y) + (column_of_square_origin(square), row_of_square_origin(square))

To speed up the process, we are using numpy and we try to avoid, as far as possible, explicit for loops.

With the names previously used, we need a listing of the origins of the admissible squares, to implement the last line of the pseudo code.

def origins_of_OK_squares(nsc, nsr, ohc, ohr, nhc, nhr):

    # a set of tuples, each one the origin of a square, for ALL the squares
    s_all = {(x, y) for x in range(nsc) for y in range(nsr)}

    # a set of tuples with the origin of the hole squares
    s_hole = {(x, y) for x in range(ohc,ohc+nhc) for y in range(ohr,ohr+nhr)}

    # the set of the origins of admissible squares is the difference
    s_adm = s_all - s_hole

    # return an array with all the origins --- the order is not important!
    # np.array doesn't like  sets
    return np.array(list(s_adm))

We need to generate n random points in the unit square, organized in an array of shape (n,2)

    rand_points = np.random.random((n, 2))

We need an array of n admissible squares

    placements = np.random.randint(0, nsq, n)

We translate each point in rand_points in one of the admissible squares, as specified by the elements of placements.

    rand_points += origins_of_OK_squares(nsc, nsr, ohc, ohr, nhc, nhr)[placements]

taking advantage of the extended addressing that it is possible for numpy arrays, and its done...

In an overly compact function

import numpy as np
def samples_wo_hole(n,  nsc, nsr, ohc, ohr, nhc, nhr):
    s_all = {(x, y) for x in range(nsc) for y in range(nsr)}
    s_hole = {(x, y) for x in range(ohc,ohc+nhc) for y in range(ohr,ohr+nhr)}
    rand_points = np.random.random((n, 2))
    placements = np.random.randint(0, nsc*nsr - nhc*nhr, n)
    return rand_points+np.array(list(s_all-s_hole))[placements]

回答3:

You have 8 equal unit squares in which it is admissible to place points, so draw as many points in an unit square as you want as in

x = np.random(n, 2)

now it suffices to choose at random in which of the 8 admissible squares each point is placed

sq = np.random.randomint(0, 8, n)

you need also an array of origins

delta = np.array([[0, 0],
                  [1, 0],
                  [2, 0],
                  [0, 1],
                  # no central square
                  [2, 1],
                  [0, 2],
                  [1, 2]
                  [2, 2]])

and finally

x = x + delta[sq]

To generalize the solution, write a function to compute an array of the origins of the admissible squares, a possible implementation using sets being

def origins(n, hole_xor, hole_wd, hole_yor, hole_hg):
    all_origins = {(x,y) for x in range(n) for y in range(n)}
    hole_origins = {(x,y) for x in range(hole_xor, hole_xor+hole_wd)
                          for y in range(hole_yor, hole_yor+hole_hg)}
    return np.array(list(all_origins-hole_origins)))

and use it like this

delta = origins(12,  4,5,  6,2)
n_squares = len(delta) # or n*n - width*height
target_square = np.random.randomint(0, n_squares, size)
x = x + delta[target_square]