scipy.optimize.minimize Jacobian function causes &

I am using the BFGS method, giving it the negative log likelihood of my squared exponential/RBF kernel, as well as the gradient (Jacobian) of it. Leaving out the gradient, it works fine using first differences - but the

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

error comes up once I try and use the gradient of the NLL. Also note that while my source code in the SE_der function (gradient/Jacobian) below doesn't use .any() or .all() in the result, I have also tried both of those, only to get exactly the same error.

The preceding trace is:

Traceback (most recent call last):
File "loaddata.py", line 107, in <module>
gp.fit(X, y)
File "/home/justinting/programming/bhm/ML/gp.py", line 33, in fit
res = minimize(self.SE_NLL, gp_hp_guess, method='bfgs', jac=True)
File "/usr/lib/python3.5/site-packages/scipy/optimize/_minimize.py", line 441, in minimize
return _minimize_bfgs(fun, x0, args, jac, callback, **options)
File "/usr/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 865, in _minimize_bfgs
old_fval, old_old_fval)
File "/usr/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 706, in _line_search_wolfe12
old_fval, old_old_fval)
File "/usr/lib/python3.5/site-packages/scipy/optimize/linesearch.py", line 282, in line_search_wolfe2
phi, derphi, old_fval, old_old_fval, derphi0, c1, c2, amax)
File "/usr/lib/python3.5/site-packages/scipy/optimize/linesearch.py", line 379, in scalar_search_wolfe2
if (phi_a1 > phi0 + c1 * alpha1 * derphi0) or \

The relevant code is as follows:

gp_hp_guess = [1.0] * 3 # initial guess
res = minimize(self.SE_NLL, gp_hp_guess, method='bfgs', jac=self.SE_der)

# other stuff

def SE_der(self, args):

    [f_err, l_scale, n_err] = args
    L = self.L_create(f_err, l_scale, n_err)
    alpha = linalg.solve(L.T, (linalg.solve(L, self.y))) # save for use with derivative func
    aaT = alpha.dot(alpha.T)
    K_inv = np.linalg.inv(L.T).dot(np.linalg.inv(L))
    # self.K_inv = np.linalg.inv(self.L.T).dot(np.linalg.inv(self.L))
    dK_dtheta = np.gradient(self.K_se(self.X, self.X, f_err, l_scale))[0]
    der = 0.5 * np.matrix.trace((aaT - K_inv).dot(dK_dtheta))
    return -der

def SE_NLL(self, args):

    [f_err, l_scale, n_err] = args
    L = self.L_create(f_err, l_scale, n_err)

    alpha = linalg.solve(L.T, (linalg.solve(L, self.y))) # save for use with derivative func
    nll = (
        0.5 * self.y.T.dot(alpha) + 
        np.matrix.trace(L) + # sum of diagonal
        L.shape[0]/2 * math.log(2*math.pi)
            )
    return nll

I've left out the source code of the helper functions as the NLL works fine when the gradient function isn't used, and they share the same helper functions.

When calling the SE_der function directly passing in the optimised parameters after the fact (and not actually using the gradient in the optimisation), it outputs a single number as expected (or at least I think that's what is expected), so I'm failing to spot the problem.

Is this error a misunderstanding on my part of what scipy expects in its Jacobian function, or something else? I tried digging through the Python source code, but the actual function call dealing with the functions are hidden behind functions that don't seem to be in the Python code on Github - I'm not sure if they're in private/C++ repos somewhere else.

Look at the side bar. See all those SO questions about that same ValueError?

While the circumstances vary, in nearly every case it is the result of using a boolean array in a Python context that expects a scalar boolean.

A simple example is

In [236]: if np.arange(10)>5:print('yes')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-236-633002262b65> in <module>()
----> 1 if np.arange(10)>5:print('yes')

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

np.arange(10)>5 produces a boolean array, np.array([False, False, ...]).

Combining boolean expressions can also produce this. np.arange(10)>5 | np.arange(10)<2 produces this, (np.arange(10)>5) | (np.arange(10)<2) does not - because of the presidence of the logical operators. Using and instead of | in this context is hopeless.

I'll look at your code in more detail, but in the meantime maybe this will help you find the problem yourself.

==================

from the error stack:

if (phi_a1 > phi0 + c1 * alpha1 * derphi0) or \

the code at this point expects (phi_a1 > phi0 + c1 * alpha1 * derphi0) (and whatever follows the or) to be a scalar. Presumably one of those variables is an array with multiple values. Admittedly this is occurring way down the calling stack so it will difficult to trace those values back to your code.

Prints, focusing on variable type and shape, might be most useful. Sometimes on these iterative solvers the code runs fine for one loop, and then some variable changes to array, and it chokes on the next loop.

==================

Why are you using np.matrix.trace? In my tests that produces a 2d single element np.matrix. It's not obvious that it would produce this ValueError, but it's still suspicious.