how to get the index of numpy.random.choice? - pyt

2020-05-20 09:49发布

问题:

Is it possible to modify the numpy.random.choice function in order to make it return the index of the chosen element? Basically, I want to create a list and select elements randomly without replacement

import numpy as np
>>> a = [1,4,1,3,3,2,1,4]
>>> np.random.choice(a)
>>> 4
>>> a
>>> [1,4,1,3,3,2,1,4]

a.remove(np.random.choice(a)) will remove the first element of the list with that value it encounters (a[1] in the example above), which may not be the chosen element (eg, a[7]).

回答1:

Here's one way to find out the index of a randomly selected element:

import random # plain random module, not numpy's
random.choice(list(enumerate(a)))[0]
=> 4      # just an example, index is 4

Or you could retrieve the element and the index in a single step:

random.choice(list(enumerate(a)))
=> (1, 4) # just an example, index is 1 and element is 4


回答2:

Regarding your first question, you can work the other way around, randomly choose from the index of the array a and then fetch the value.

>>> a = [1,4,1,3,3,2,1,4]
>>> a = np.array(a)
>>> random.choice(arange(a.size))
6
>>> a[6]

But if you just need random sample without replacement, replace=False will do. Can't remember when it was firstly added to random.choice, might be 1.7.0. So if you are running very old numpy it may not work. Keep in mind the default is replace=True



回答3:

numpy.random.choice(a, size=however_many, replace=False)

If you want a sample without replacement, just ask numpy to make you one. Don't loop and draw items repeatedly. That'll produce bloated code and horrible performance.

Example:

>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.random.choice(a, size=5, replace=False)
array([7, 5, 8, 6, 2])

On a sufficiently recent NumPy (at least 1.17), you should use the new randomness API, which fixes a longstanding performance issue where the old API's replace=False code path unnecessarily generated a complete permutation of the input under the hood:

rng = numpy.random.default_rng()
result = rng.choice(a, size=however_many, replace=False)


回答4:

This is a bit in left field compared with the other answers, but I thought it might help what it sounds like you're trying to do in a slightly larger sense. You can generate a random sample without replacement by shuffling the indices of the elements in the source array :

source = np.random.randint(0, 100, size=100) # generate a set to sample from
idx = np.arange(len(source))
np.random.shuffle(idx)
subsample = source[idx[:10]]

This will create a sample (here, of size 10) by drawing elements from the source set (here, of size 100) without replacement.

You can interact with the non-selected elements by using the remaining index values, i.e.:

notsampled = source[idx[10:]]


回答5:

Instead of using choice, you can also simply random.shuffle your array, i.e.

random.shuffle(a)  # will shuffle a in-place


回答6:

Based on your comment:

The sample is already a. I want to work directly with a so that I can control how many elements are still left and perform other operations with a. – HappyPy

it sounds to me like you're interested in working with a after n randomly selected elements are removed. Instead, why not work with N = len(a) - n randomly selected elements from a? Since you want them to still be in the original order, you can select from indices like in @CTZhu's answer, but then sort them and grab from the original list:

import numpy as np
n = 3 #number to 'remove'
a = np.array([1,4,1,3,3,2,1,4])
i = np.random.choice(np.arange(a.size), a.size-n, replace=False)
i.sort()
a[i]
#array([1, 4, 1, 3, 1])

So now you can save that as a again:

a = a[i]

and work with a with n elements removed.



回答7:

Here is a simple solution, just choose from the range function.

import numpy as np
a = [100,400,100,300,300,200,100,400]
I=np.random.choice(np.arange(len(a)))
print('index is '+str(I)+' number is '+str(a[I]))


回答8:

Maybe late but it worth to mention this solution because I think the simplest way to do so is:

a = [1,4,1,3,3,2,1,4]
n = len(a)
idx = np.random.choice(list(range(n)), p=np.ones(n)/n)

It means you are choosing from the indices uniformly. In a more general case, you can do a weighted sampling (and return the index) in this way:

probs = [.3, .4, .2, 0, .1]
n = len(a)
idx = np.random.choice(list(range(n)), p=probs)

If you try to do so for so many times (e.g. 1e5), the histogram of the chosen indices would be like [0.30126 0.39817 0.19986 0. 0.10071] in this case which is correct.

Anyway, you should choose from the indices and use the values (if you need) as their probabilities.