可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a NumPy array a
like the following:
>>> str(a)
'[ nan nan nan 1.44955726 1.44628034 1.44409573\n 1.4408188 1.43657094 1.43171624 1.42649744 1.42200684 1.42117704\n 1.42040255 1.41922908 nan nan nan nan\n nan nan]'
I want to replace each NaN with the closest non-NaN value, so that all of the NaN's at the beginning get set to 1.449...
and all of the NaN's at the end get set to 1.419...
.
I can see how to do this for specific cases like this, but I need to be able to do it generally for any length of array, with any length of NaN's at the beginning and end of the array (there will be no NaN's in the middle of the numbers). Any ideas?
I can find the NaN's easily enough with np.isnan()
, but I can't work out how to get the closest value to each NaN.
回答1:
I want to replace each NaN with the closest non-NaN value... there will be no NaN's in the middle of the numbers
The following will do it:
ind = np.where(~np.isnan(a))[0]
first, last = ind[0], ind[-1]
a[:first] = a[first]
a[last + 1:] = a[last]
This is a straight numpy
solution requiring no Python loops, no recursion, no list comprehensions etc.
回答2:
As an alternate solution (this will linearly interpolate for arrays NaN
s in the middle, as well):
import numpy as np
# Generate data...
data = np.random.random(10)
data[:2] = np.nan
data[-1] = np.nan
data[4:6] = np.nan
print data
# Fill in NaN's...
mask = np.isnan(data)
data[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask), data[~mask])
print data
This yields:
[ nan nan 0.31619306 0.25818765 nan nan
0.27410025 0.23347532 0.02418698 nan]
[ 0.31619306 0.31619306 0.31619306 0.25818765 0.26349185 0.26879605
0.27410025 0.23347532 0.02418698 0.02418698]
回答3:
NaN
s have the interesting property of comparing different from themselves, thus we can quickly find the indexes of the non-nan elements:
idx = np.nonzero(a==a)[0]
it's now easy to replace the nans with the desired value:
for i in range(0, idx[0]):
a[i]=a[idx[0]]
for i in range(idx[-1]+1, a.size)
a[i]=a[idx[-1]]
Finally, we can put this in a function:
import numpy as np
def FixNaNs(arr):
if len(arr.shape)>1:
raise Exception("Only 1D arrays are supported.")
idxs=np.nonzero(arr==arr)[0]
if len(idxs)==0:
return None
ret=arr
for i in range(0, idxs[0]):
ret[i]=ret[idxs[0]]
for i in range(idxs[-1]+1, ret.size):
ret[i]=ret[idxs[-1]]
return ret
edit
Ouch, coming from C++ I always forget about list ranges... @aix's solution is way more elegant and efficient than my C++ish loops, use that instead of mine.
回答4:
A recursive solution!
def replace_leading_NaN(a, offset=0):
if a[offset].isNaN():
new_value = replace_leading_NaN(a, offset + 1)
a[offset] = new_value
return new_value
else:
return a[offset]
def replace_trailing_NaN(a, offset=-1):
if a[offset].isNaN():
new_value = replace_trailing_NaN(a, offset - 1)
a[offset] = new_value
return new_value
else:
return a[offset]
回答5:
I came across the problem and had to find a custom solution for scattered NaNs. The function below replaces any NaN by the first number occurrence to the right, if none exists, it replaces it by the first number occurrence to the left. Further manipulation can be done to replace it with the mean of boundary occurrences.
import numpy as np
Data = np.array([np.nan,1.3,np.nan,1.4,np.nan,np.nan])
nansIndx = np.where(np.isnan(Data))[0]
isanIndx = np.where(~np.isnan(Data))[0]
for nan in nansIndx:
replacementCandidates = np.where(isanIndx>nan)[0]
if replacementCandidates.size != 0:
replacement = Data[isanIndx[replacementCandidates[0]]]
else:
replacement = Data[isanIndx[np.where(isanIndx<nan)[0][-1]]]
Data[nan] = replacement
Result is:
>>> Data
array([ 1.3, 1.3, 1.4, 1.4, 1.4, 1.4])
回答6:
I got something like this
i = [i for i in range(len(a)) if not np.isnan(a[i])]
a = [a[i[0]] if x < i[0] else (a[i[-1]] if x > i[-1] else a[x]) for x in range(len(a))]
It's a bit clunky though given it's split up in two lines with nested inline if's in one of them.
回答7:
Here is a solution using simple python iterators. They are actually more efficient here than numpy.where
, especially with big arrays! See comparison of similar code here.
import numpy as np
a = np.array([np.NAN, np.NAN, np.NAN, 1.44955726, 1.44628034, 1.44409573, 1.4408188, 1.43657094, 1.43171624, 1.42649744, 1.42200684, 1.42117704, 1.42040255, 1.41922908, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN])
mask = np.isfinite(a)
# get first value in list
for i in range(len(mask)):
if mask[i]:
first = i
break
# get last vaue in list
for i in range(len(mask)-1, -1, -1):
if mask[i]:
last = i
break
# fill NaN with near known value on the edges
a = np.copy(a)
a[:first] = a[first]
a[last + 1:] = a[last]
print(a)
Output:
[1.44955726 1.44955726 1.44955726 1.44955726 1.44628034 1.44409573
1.4408188 1.43657094 1.43171624 1.42649744 1.42200684 1.42117704
1.42040255 1.41922908 1.41922908 1.41922908 1.41922908 1.41922908
1.41922908 1.41922908]
It replaces only the first and last NaNs like requested here.