I am using the BFGS method, giving it the negative log likelihood of my squared exponential/RBF kernel, as well as the gradient (Jacobian) of it. Leaving out the gradient, it works fine using first differences - but the
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
error comes up once I try and use the gradient of the NLL. Also note that while my source code in the SE_der function (gradient/Jacobian) below doesn't use .any() or .all() in the result, I have also tried both of those, only to get exactly the same error.
The preceding trace is:
Traceback (most recent call last):
File "loaddata.py", line 107, in <module>
gp.fit(X, y)
File "/home/justinting/programming/bhm/ML/gp.py", line 33, in fit
res = minimize(self.SE_NLL, gp_hp_guess, method='bfgs', jac=True)
File "/usr/lib/python3.5/site-packages/scipy/optimize/_minimize.py", line 441, in minimize
return _minimize_bfgs(fun, x0, args, jac, callback, **options)
File "/usr/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 865, in _minimize_bfgs
old_fval, old_old_fval)
File "/usr/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 706, in _line_search_wolfe12
old_fval, old_old_fval)
File "/usr/lib/python3.5/site-packages/scipy/optimize/linesearch.py", line 282, in line_search_wolfe2
phi, derphi, old_fval, old_old_fval, derphi0, c1, c2, amax)
File "/usr/lib/python3.5/site-packages/scipy/optimize/linesearch.py", line 379, in scalar_search_wolfe2
if (phi_a1 > phi0 + c1 * alpha1 * derphi0) or \
The relevant code is as follows:
gp_hp_guess = [1.0] * 3 # initial guess
res = minimize(self.SE_NLL, gp_hp_guess, method='bfgs', jac=self.SE_der)
# other stuff
def SE_der(self, args):
[f_err, l_scale, n_err] = args
L = self.L_create(f_err, l_scale, n_err)
alpha = linalg.solve(L.T, (linalg.solve(L, self.y))) # save for use with derivative func
aaT = alpha.dot(alpha.T)
K_inv = np.linalg.inv(L.T).dot(np.linalg.inv(L))
# self.K_inv = np.linalg.inv(self.L.T).dot(np.linalg.inv(self.L))
dK_dtheta = np.gradient(self.K_se(self.X, self.X, f_err, l_scale))[0]
der = 0.5 * np.matrix.trace((aaT - K_inv).dot(dK_dtheta))
return -der
def SE_NLL(self, args):
[f_err, l_scale, n_err] = args
L = self.L_create(f_err, l_scale, n_err)
alpha = linalg.solve(L.T, (linalg.solve(L, self.y))) # save for use with derivative func
nll = (
0.5 * self.y.T.dot(alpha) +
np.matrix.trace(L) + # sum of diagonal
L.shape[0]/2 * math.log(2*math.pi)
)
return nll
I've left out the source code of the helper functions as the NLL works fine when the gradient function isn't used, and they share the same helper functions.
When calling the SE_der function directly passing in the optimised parameters after the fact (and not actually using the gradient in the optimisation), it outputs a single number as expected (or at least I think that's what is expected), so I'm failing to spot the problem.
Is this error a misunderstanding on my part of what scipy expects in its Jacobian function, or something else? I tried digging through the Python source code, but the actual function call dealing with the functions are hidden behind functions that don't seem to be in the Python code on Github - I'm not sure if they're in private/C++ repos somewhere else.