How to select value from array that is closest to

I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.

The catch is the size of the choices is defined at runtime.

import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])

If choices was static in size, I would simply use np.where

d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]), 
      np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
         np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))

To get the output:

>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]

Is there a way to do this more dynamically while still preserving the vectorization.

标签： python performance numpy vectorization

3条回答

对你真心纯属浪费

2楼-- · 2019-05-26 07:02

I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -

def searchsorted_app(a, choices):
    lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
    ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
    cl = np.take(choices,lidx) # Or choices[lidx]
    cr = np.take(choices,ridx) # Or choices[ridx]
    mask = np.abs(a - cl) > np.abs(a - cr)
    cl[mask] = cr[mask]
    return cl

Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.

Runtime test -

In [160]: # Setup inputs
     ...: a = np.random.rand(100,100)
     ...: choices = np.sort(np.random.rand(100))
     ...: 

In [161]: def broadcasting_app(a, choices): # @wwii's solution
     ...:     return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
     ...: 

In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True

In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop

In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop

0人赞添加讨论(0) 举报

倾城　Initia

3楼-- · 2019-05-26 07:08

To explain wwii's excellent answer in a little more detail:

The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:

>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1,  5, 10],
        [ 1,  5, 10],
        [ 1,  5, 10]],
       [[ 3,  1,  6],
        [ 3,  1,  6],
        [ 3,  1,  6]],
       [[ 8,  4,  1],
        [ 8,  4,  1],
        [ 8,  4,  1]]])

Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:

>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Which finally allows you to choose those elements from choices:

>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1,  1,  1],
       [ 5,  5,  5],
       [10, 10, 10]])

For a non-symmetric shape:

Let's say a had shape (2, 5):

>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Then you'd get:

>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1,  5, 10],
        [ 0,  4,  9],
        [ 1,  3,  8],
        [ 2,  2,  7],
        [ 3,  1,  6]],

       [[ 4,  0,  5],
        [ 5,  1,  4],
        [ 6,  2,  3],
        [ 7,  3,  2],
        [ 8,  4,  1]]])

This is hard to read, but what it's saying is, b has shape:

>>> b.shape
(2, 5, 3)

The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:

>>> b[:, :, 0]  # = abs(a - 1)
array([[1, 0, 1, 2, 3],
       [4, 5, 6, 7, 8]])
>>> b[:, :, 1]  # = abs(a - 5)
array([[5, 4, 3, 2, 1],
       [0, 1, 2, 3, 4]])
>>> b[:, :, 2]  # = abs(a - 10)
array([[10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.

Hope that helps explain this a little more clearly.

0人赞添加讨论(0) 举报

叼着烟拽天下

4楼-- · 2019-05-26 07:21

Subtract choices from a, find the index of the minimum of the result, substitute.

a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>> 
[[ 1  1  1]
 [ 5  5  5]
 [10 10 10]]

a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>>    
[[ 1  1  1]
 [ 5 10  5]
 [10  1 10]]
>>>

The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.

For the second example:

>>> print b
[[[ 1  5 10]
  [ 2  2  7]
  [ 1  5 10]]

 [[ 3  1  6]
  [ 7  3  2]
  [ 3  1  6]]

 [[ 8  4  1]
  [ 0  4  9]
  [ 8  4  1]]]
>>> print i
[[0 0 0]
 [1 2 1]
 [2 0 2]]
>>>

The final assignment uses an Index Array or Integer Array Indexing.

In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.

0人赞添加讨论(0) 举报

How to select value from array that is closest to

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间