I am having a difficulty with applying a function to an array when the function contains a condition. I have an inefficient workaround and am looking for an efficient (fast) approach. In a simple example:
pts = np.linspace(0,1,11)
def fun(x, y):
if x > y:
return 0
else:
return 1
Now, if I run:
result = fun(pts, pts)
then I get the error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
raised at the if x > y
line. My inefficient workaround, which gives the correct result but is too slow is:
result = np.full([len(pts)]*2, np.nan)
for i in range(len(pts)):
for j in range(len(pts)):
result[i,j] = fun(pts[i], pts[j])
What is the best way to obtain this in a nicer (and more importantly, faster) way?
I am having a difficulty with applying a function to an array when the function contains a condition. I have an inefficient workaround and am looking for an efficient (fast) approach. In a simple example:
pts = np.linspace(0,1,11)
def fun(x, y):
if x > y:
return 0
else:
return 1
Now, if I run:
result = fun(pts, pts)
then I get the error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
raised at the if x > y
line. My inefficient workaround, which gives the correct result but is too slow is:
result = np.full([len(pts)]*2, np.nan)
for i in range(len(pts)):
for j in range(len(pts)):
result[i,j] = fun(pts[i], pts[j])
What is the best way to obtain this in a nicer (and more importantly, faster) way?
EDIT: using
def fun(x, y):
if x > y:
return 0
else:
return 1
x = np.array(range(10))
y = np.array(range(10))
xv,yv = np.meshgrid(x,y)
result = fun(xv, yv)
still raises the same ValueError
.
For a cartesian comparison to these two 1d arrays, reshape one so it can use
broadcasting
:Your function, with the
if
only works for scalar inputs. If given arrays, thea>b
produces a boolean array, which cannot be used in anif
statement. Your iteration works because it passes scalar values. For some complex functions that's the best you can do (np.vectorize
can make the iteration simpler, but not faster).My answer is to look at the array comparison, and derive the answer from that. In this case, the 3 argument
where
does a nice job of mapping the boolean array onto the desired 1/0. There are other ways of doing this mapping as well.Your double loop requires an added layer of coding, the broadcasted
None
.For a more complex example or if the arrays you are dealing with are a bit larger, or if you can write to a already preallocated array you could consider
Numba
.Example
Timings
*Maybe influenced by cache. Test on your own examples.
The error is quite explicit - suppose you have
such that
what should be the result of your
if np.array([0,1])
statement? is it true or false?numpy
is telling you this is ambiguous. Usingor
is explicit, and thus
numpy
is offering you solutions - either any cell pair fulfills the condition, or all of them - both an unambiguous truth value. You have to define for yourself exactly what you meant by vector x is larger than vector y.The
numpy
solution to operate on all pairs ofx
andy
such thatx[i]>y[j]
is to use mesh grid to generate all pairs:either send
xv
andyv
tofun
, or create the mesh in the function, depending on what makes more sense. This generates all pairsxi,yj
such thatxi>yj
. If you want the actual indices just returnxv>yv
, where each cellij
correspondsx[i]
andy[j]
. In your case:will return a matrix where
fun(x,y)[i][j]
is True ifx[i]>y[j]
, or False otherwise. Alternativelywill return a tuple of two arrays of pairs of the indices, such that
will guarantee
x[i]>y[j]
as well.