可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Let's say we have the following function:
def f(x, y):
if y == 0:
return 0
return x/y
This works fine with scalar values. Unfortunately when I try to use numpy arrays for x
and y
the comparison y == 0
is treated as an array operation which results in an error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-9884e2c3d1cd> in <module>()
----> 1 f(np.arange(1,10), np.arange(10,20))
<ipython-input-10-fbd24f17ea07> in f(x, y)
1 def f(x, y):
----> 2 if y == 0:
3 return 0
4 return x/y
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I tried to use np.vectorize
but it doesn't make a difference, the code still fails with the same error. np.vectorize
is one option which gives the result I expect.
The only solution that I can think of is to use np.where
on the y
array with something like:
def f(x, y):
np.where(y == 0, 0, x/y)
which doesn't work for scalars.
Is there a better way to write a function which contains an if statement? It should work with both scalars and arrays.
回答1:
One way is to convert x
and y
to numpy arrays inside your function:
def f(x, y):
x = np.array(x)
y = np.array(y)
return np.where(y == 0, 0, x/y)
This will work when one of x
or y
is a scalar and the other is a numpy array. It will also work if they are both arrays that can be broadcast. It won't work if they're arrays of incompatible shapes (e.g., 1D arrays of different lengths), but it's not clear what the desired behavior would be in that case anyway.
回答2:
I wonder what the problem is you're facing with np.vectorize
. It works fine on my system:
In [145]: def f(x, y):
...: if y == 0:
...: return 0
...: return x/y
In [146]: vf = np.vectorize(f)
In [147]: vf([[3],[10]], [0,1,2,0])
Out[147]:
array([[ 0, 3, 1, 0],
[ 0, 10, 5, 0]])
Note that the result dtype
is determined by the result of the first element. You can also set the desired output yourself:
In [148]: vf = np.vectorize(f, otypes=[np.float])
In [149]: vf([[3],[10]], [0,1,2,0])
Out[149]:
array([[ 0. , 3. , 1.5, 0. ],
[ 0. , 10. , 5. , 0. ]])
There are more examples in the docs.
回答3:
You can use a masked array that will perform the division only where y!=0
:
def f(x, y):
x = np.atleast_1d(np.array(x))
y = np.atleast_1d(np.ma.array(y, mask=(y==0)))
ans = x/y
ans[ans.mask]=0
return np.asarray(ans)
回答4:
A kind of clunky but effective way is to basically pre-process the data:
def f(x, y):
if type(x) == int and type(y) == int: return x/y # Will it ever be used for this?
# Change scalars to arrays
if type(x) == int: x = np.full(y.shape, x, dtype=y.dtype)
if type(y) == int: y = np.full(x.shape, y, dtype=x.dtype)
# Change all divide by zero operations to 0/1
div_zero_idx = (y==0)
x[div_zero_idx] = 0
y[div_zero_idx] = 1
return x/y
I timed all the different approaches:
def f_mask(x, y):
x = np.ma.array(x, mask=(y==0))
y = np.array(y)
ans = x/y
ans[ans.mask]=0
return np.asarray(ans)
def f_where(x, y):
x = np.array(x)
y = np.array(y)
return np.where(y == 0, 0, x/y)
def f_vect(x, y):
if y == 0:
return 0
return x/y
vf = np.vectorize(f_vect)
print timeit.timeit('f(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f; import numpy as np; array_length=1000")
print timeit.timeit('f_mask(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f_mask; import numpy as np; array_length=1000")
print timeit.timeit('f_where(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f_where; import numpy as np; array_length=1000")
print timeit.timeit('vf(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import vf; import numpy as np; array_length=(1000)")
# f
# 0.760189056396
# f_mask
# 2.24414896965
# f_where
# RuntimeWarning: divide by zero encountered in divide return np.where(y == 0, 0, x/y)
# 1.08176398277
# f_vect
# 3.45374488831
The first function is the quickest, and has no warnings. The time ratios are similar if x or y are scalars. For higher dimensional arrays, the masked array approach gets relatively faster (it's still the slowest though).