Numpy has the random.choice
function, which allows you to sample from a categorical distribution. How would you repeat this over an axis? To illustrate what I mean, here is my current code:
categorical_distributions = np.array([
[.1, .3, .6],
[.2, .4, .4],
])
_, n = categorical_distributions.shape
np.array([np.random.choice(n, p=row)
for row in categorical_distributions])
Ideally, I would like to eliminate the for loop.
Here's one vectorized way to get the random indices per row, with
a
as the2D
array of probabilities -Generalizing to cover both along the rows and columns for
2D
array -Let's verify with the given sample by running it over a million times -
Runtime test
Original loopy way -
Timings on bigger array -