Find nearest value in numpy array

2019-01-01 04:45发布

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

np.find_nearest( array, value )

15条回答
情到深处是孤独
2楼-- · 2019-01-01 05:21

Here is a fast vectorized version of @Dimitri's solution if you have many values to search for (values can be multi-dimensional array):

#`values` should be sorted
def get_closest(array, values):
    #make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")

    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1

    return array[idxs]

Benchmarks

> 100 times faster than using a for loop with @Demitri's solution`

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds
查看更多
若你有天会懂
3楼-- · 2019-01-01 05:23

With slight modification, the answer above works with arrays of arbitrary dimension (1d, 2d, 3d, ...):

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

Or, written as a single line:

a.flat[np.abs(a - a0).argmin()]
查看更多
与君花间醉酒
4楼-- · 2019-01-01 05:24
import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))
查看更多
回忆,回不去的记忆
5楼-- · 2019-01-01 05:24

Maybe helpful for ndarrays:

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]
查看更多
春风洒进眼中
6楼-- · 2019-01-01 05:25

Summary of answer: If one has a sorted array then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either. If you have an unsorted array then if array is large, one should consider first using an O(n logn) sort and then bisection, and if array is small then method 2 seems the fastest.

First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first - note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

Bisection:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

Now I'll define the code from the other answers, they each return an index:

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

Now I'll time the codes: Note methods 1,2,4,5 don't correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 - 1000 times faster). For smaller arrays it's ~2-100 times faster.

查看更多
余生无你
7楼-- · 2019-01-01 05:26

All the answers are beneficial to gather the information to write efficient code. However, I have written a small Python script to optimize for various cases. It will be the best case if the provided array is sorted. If one searches the index of the nearest point of a specified value, then bisect module is the most time efficient. When one search the indices correspond to an array, the numpy searchsorted is most efficient.

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

In [63]: %time bisect.bisect_left(xlist, 0.3) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 22.2 µs

np.searchsorted(xar, 0.3, side="left")

In [64]: %time np.searchsorted(xar, 0.3, side="left") CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted(xar, randpts, side="left") CPU times: user 4 ms, sys: 0 ns, total: 4 ms Wall time: 1.2 ms

If we follow the multiplicative rule, then numpy should take ~100 ms which implies ~83X faster.

查看更多
登录 后发表回答