Error raised in code that shouldn't run accord

2019-09-18 01:59发布

问题:

I have the following code as part of a function:

px = x2 - x1
py = y2 - y1
pz = z2 - z1

div = px*px + py*py    

u = ((x0 - x1) * px + (y0 - y1) * py) / div

the u= line returns RuntimeWarning: invalid value encountered in divide when run. This is because occasionally the div= line returns zero.

However, if I rewrite the u= line as:

u = np.where(div != 0, ((x0 - x1) * px + (y0 - y1) * py) / div, 0)

it still returns the same runtime warning.

This code outputs the desired numbers however I thought that the np.where function was lazy. If this is not the case then there are potential speed ups in other bits of code I have written (hence I'm asking this question). What am I missing? Does the np.where function calculate both the 'True' and 'False inputs and then select one depending on the boolean?

Note, this is the solution I've ended up with:

np.seterr(invalid='ignore')
u = np.where(div != 0, ((x0 - x1) * px + (y0 - y1) * py) / div, 0)
np.seterr(invalid='warn')

although this works fine too:

u = np.zeros_like(div)
divb = div != 0
u[divb] = ((x0[divb] - x1[divb]) * px[divb] + 
        (y0[divb] - y1[divb]) * py[divb]) / div[divb]

(which is kind of what I thought np.where did...)

Both these solutions are around the same speed but both are slower than just the np.where function on its own.

Any explanations/suggestions welcome! Thanks.

回答1:

This is a divide by 0 issue that programmers have been working around since the earliest vectorized languages (e.g. APL, MATLAB).

One solution which I've used in the past is to (conditionally) add 1 to the divisor:

u = ((x0 - x1) * px + (y0 - y1) * py) / (div + (div==0))

It doesn't work in all cases, but it might in this, since it appears the div will be 0 only if both px and py are 0. In that case the numerator is also 0. 0/1 = 0.

Or clipping div with a small value (this may be the fastest solution):

..../np.maximum(div,1e-16)

A quick search on SO for numpy divide by zero turned up other questions. For example:

https://stackoverflow.com/a/26248892/901925 suggests using errstate context to turn off the warning:

with numpy.errstate(divide='ignore'):
    u = .../div

divide works for cases like 1/0, while invalid is needed for 0/0 cases. But ignore ends up putting either inf or nan in the return array. So you still need to test for (div==0) to get a 0 value.

Though I rather like the appearance of this form:

with np.errstate(invalid='ignore'):
    u = np.where(div==0, 0, .../div)

Warren's comment explains why where doesn't work. The arguments are evaluate before being passed to the function. Lazy-evaluation requires the cooperation of the Python interpreter. It is normally part of the syntax (e.g. if,|,&).