Replace NaN's in NumPy array with closest non-

I have a NumPy array a like the following:

>>> str(a)
'[        nan         nan         nan  1.44955726  1.44628034  1.44409573\n  1.4408188   1.43657094  1.43171624  1.42649744  1.42200684  1.42117704\n  1.42040255  1.41922908         nan         nan         nan         nan\n         nan         nan]'

I want to replace each NaN with the closest non-NaN value, so that all of the NaN's at the beginning get set to 1.449... and all of the NaN's at the end get set to 1.419....

I can see how to do this for specific cases like this, but I need to be able to do it generally for any length of array, with any length of NaN's at the beginning and end of the array (there will be no NaN's in the middle of the numbers). Any ideas?

I can find the NaN's easily enough with np.isnan(), but I can't work out how to get the closest value to each NaN.

标签： python arrays numpy nan

7条回答

可以哭但决不认输i

2楼-- · 2020-06-07 07:40

I got something like this

i = [i for i in range(len(a)) if not np.isnan(a[i])]
a = [a[i[0]] if x < i[0] else (a[i[-1]] if x > i[-1] else a[x]) for x in range(len(a))]

It's a bit clunky though given it's split up in two lines with nested inline if's in one of them.

0人赞添加讨论(0) 举报

地球回转人心会变

3楼-- · 2020-06-07 07:47

NaNs have the interesting property of comparing different from themselves, thus we can quickly find the indexes of the non-nan elements:

idx = np.nonzero(a==a)[0]

it's now easy to replace the nans with the desired value:

for i in range(0, idx[0]):
    a[i]=a[idx[0]]
for i in range(idx[-1]+1, a.size)
    a[i]=a[idx[-1]]

Finally, we can put this in a function:

import numpy as np

def FixNaNs(arr):
    if len(arr.shape)>1:
        raise Exception("Only 1D arrays are supported.")
    idxs=np.nonzero(arr==arr)[0]

    if len(idxs)==0:
        return None

    ret=arr

    for i in range(0, idxs[0]):
        ret[i]=ret[idxs[0]]

    for i in range(idxs[-1]+1, ret.size):
        ret[i]=ret[idxs[-1]]

    return ret

edit

Ouch, coming from C++ I always forget about list ranges... @aix's solution is way more elegant and efficient than my C++ish loops, use that instead of mine.

0人赞添加讨论(0) 举报

在下西门庆

4楼-- · 2020-06-07 07:52

A recursive solution!

def replace_leading_NaN(a, offset=0):
    if a[offset].isNaN():
        new_value = replace_leading_NaN(a, offset + 1)
        a[offset] = new_value
        return new_value
    else:
        return a[offset]

def replace_trailing_NaN(a, offset=-1):
    if a[offset].isNaN():
        new_value = replace_trailing_NaN(a, offset - 1)
        a[offset] = new_value
        return new_value
    else:
        return a[offset]

0人赞添加讨论(0) 举报

老娘就宠你

5楼-- · 2020-06-07 07:54

I want to replace each NaN with the closest non-NaN value... there will be no NaN's in the middle of the numbers

The following will do it:

ind = np.where(~np.isnan(a))[0]
first, last = ind[0], ind[-1]
a[:first] = a[first]
a[last + 1:] = a[last]

This is a straight numpy solution requiring no Python loops, no recursion, no list comprehensions etc.

0人赞添加讨论(0) 举报

叛逆

6楼-- · 2020-06-07 07:54

As an alternate solution (this will linearly interpolate for arrays NaNs in the middle, as well):

import numpy as np

# Generate data...
data = np.random.random(10)
data[:2] = np.nan
data[-1] = np.nan
data[4:6] = np.nan

print data

# Fill in NaN's...
mask = np.isnan(data)
data[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask), data[~mask])

print data

This yields:

[        nan         nan  0.31619306  0.25818765         nan         nan
  0.27410025  0.23347532  0.02418698         nan]

[ 0.31619306  0.31619306  0.31619306  0.25818765  0.26349185  0.26879605
  0.27410025  0.23347532  0.02418698  0.02418698]

0人赞添加讨论(0) 举报

爷、活的狠高调

7楼-- · 2020-06-07 07:55

I came across the problem and had to find a custom solution for scattered NaNs. The function below replaces any NaN by the first number occurrence to the right, if none exists, it replaces it by the first number occurrence to the left. Further manipulation can be done to replace it with the mean of boundary occurrences.

import numpy as np

Data = np.array([np.nan,1.3,np.nan,1.4,np.nan,np.nan])

nansIndx = np.where(np.isnan(Data))[0]
isanIndx = np.where(~np.isnan(Data))[0]
for nan in nansIndx:
    replacementCandidates = np.where(isanIndx>nan)[0]
    if replacementCandidates.size != 0:
        replacement = Data[isanIndx[replacementCandidates[0]]]
    else:
        replacement = Data[isanIndx[np.where(isanIndx<nan)[0][-1]]]
    Data[nan] = replacement

Result is:

>>> Data
array([ 1.3,  1.3,  1.4,  1.4,  1.4,  1.4])

0人赞添加讨论(0) 举报

1 2 下一页

Replace NaN's in NumPy array with closest non-

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间