How to vectorize a function which contains an if s

Let's say we have the following function:

def f(x, y):
    if y == 0:
        return 0
    return x/y

This works fine with scalar values. Unfortunately when I try to use numpy arrays for x and y the comparison y == 0 is treated as an array operation which results in an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-9884e2c3d1cd> in <module>()
----> 1 f(np.arange(1,10), np.arange(10,20))

<ipython-input-10-fbd24f17ea07> in f(x, y)
      1 def f(x, y):
----> 2     if y == 0:
      3         return 0
      4     return x/y

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

~~I tried to use np.vectorize but it doesn't make a difference, the code still fails with the same error.~~ np.vectorize is one option which gives the result I expect.

The only solution that I can think of is to use np.where on the y array with something like:

def f(x, y):
    np.where(y == 0, 0, x/y)

which doesn't work for scalars.

Is there a better way to write a function which contains an if statement? It should work with both scalars and arrays.

标签： python arrays numpy vectorization

4条回答

可以哭但决不认输i

2楼-- · 2019-03-26 05:52

You can use a masked array that will perform the division only where y!=0:

def f(x, y):
    x = np.atleast_1d(np.array(x))
    y = np.atleast_1d(np.ma.array(y, mask=(y==0)))
    ans = x/y
    ans[ans.mask]=0
    return np.asarray(ans)

0人赞添加讨论(0) 举报

劫难

3楼-- · 2019-03-26 05:56

I wonder what the problem is you're facing with np.vectorize. It works fine on my system:

In [145]: def f(x, y):
     ...:     if y == 0:
     ...:         return 0
     ...:     return x/y

In [146]: vf = np.vectorize(f)

In [147]: vf([[3],[10]], [0,1,2,0])
Out[147]: 
array([[ 0,  3,  1,  0],
       [ 0, 10,  5,  0]])

Note that the result dtype is determined by the result of the first element. You can also set the desired output yourself:

In [148]: vf = np.vectorize(f, otypes=[np.float])

In [149]: vf([[3],[10]], [0,1,2,0])
Out[149]: 
array([[  0. ,   3. ,   1.5,   0. ],
       [  0. ,  10. ,   5. ,   0. ]])

There are more examples in the docs.

0人赞添加讨论(0) 举报

戒情不戒烟

4楼-- · 2019-03-26 05:57

One way is to convert x and y to numpy arrays inside your function:

def f(x, y):
    x = np.array(x)
    y = np.array(y)
    return np.where(y == 0, 0, x/y)

This will work when one of x or y is a scalar and the other is a numpy array. It will also work if they are both arrays that can be broadcast. It won't work if they're arrays of incompatible shapes (e.g., 1D arrays of different lengths), but it's not clear what the desired behavior would be in that case anyway.

0人赞添加讨论(0) 举报

【Aperson】

5楼-- · 2019-03-26 06:01

A kind of clunky but effective way is to basically pre-process the data:

def f(x, y):
    if type(x) == int and type(y) == int: return x/y # Will it ever be used for this?

    # Change scalars to arrays
    if type(x) == int: x = np.full(y.shape, x, dtype=y.dtype)
    if type(y) == int: y = np.full(x.shape, y, dtype=x.dtype)

    # Change all divide by zero operations to 0/1
    div_zero_idx = (y==0)
    x[div_zero_idx] = 0
    y[div_zero_idx] = 1

    return x/y

I timed all the different approaches:

def f_mask(x, y):
    x = np.ma.array(x, mask=(y==0))
    y = np.array(y)
    ans = x/y
    ans[ans.mask]=0
    return np.asarray(ans)

def f_where(x, y):
    x = np.array(x)
    y = np.array(y)
    return np.where(y == 0, 0, x/y)

def f_vect(x, y):
    if y == 0:
        return 0
    return x/y

vf = np.vectorize(f_vect)

print timeit.timeit('f(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f; import numpy as np; array_length=1000")
print timeit.timeit('f_mask(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f_mask; import numpy as np; array_length=1000")
print timeit.timeit('f_where(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f_where; import numpy as np; array_length=1000")
print timeit.timeit('vf(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import vf; import numpy as np; array_length=(1000)")

# f
# 0.760189056396

# f_mask
# 2.24414896965

# f_where
# RuntimeWarning: divide by zero encountered in divide return np.where(y == 0, 0, x/y)
# 1.08176398277

# f_vect
# 3.45374488831

The first function is the quickest, and has no warnings. The time ratios are similar if x or y are scalars. For higher dimensional arrays, the masked array approach gets relatively faster (it's still the slowest though).

0人赞添加讨论(0) 举报

How to vectorize a function which contains an if s

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间