Jacobian and Hessian inputs in `scipy.optimize.min

2020-03-01 23:32发布

I am trying to understand how the "dogleg" method works in Python's scipy.optimize.minimize function. I am adapting the example at the bottom of the help page.

The dogleg method requires a Jacobian and Hessian argument according to the notes. For this I use the numdifftools package:

import numpy as np
from scipy.optimize import minimize
from numdifftools import Jacobian, Hessian

def fun(x,a):
    return (x[0] - 1)**2 + (x[1] - a)**2

x0 = np.array([2,0]) # initial guess
a = 2.5

res = minimize(fun, x0, args=(a), method='dogleg',
               jac=Jacobian(fun)([2,0]), hess=Hessian(fun)([2,0]))

print(res)

Edit:

If I make a change as suggested by a post below,

res = minimize(fun, x0, args=a, method='dogleg',
               jac=Jacobian(lambda x: fun(x,a)),
               hess=Hessian(lambda x: fun(x,a)))

I get an error TypeError: <lambda>() takes 1 positional argument but 2 were given. What am I doing wrong?

Also is it correct to calculate the Jacobian and Hessian at the initial guess x0?

2条回答
贼婆χ
2楼-- · 2020-03-02 00:08

That error is coming from the calls to Jacobian and Hessian, not in minimize. Replacing Jacobian(fun) with Jacobian(lambda x: fun(x, a)) and similarly for Hessian should do the trick (since now the function being differentiated only has a single vector argument).

One other thing: (a) is just a, if you want it to be a tuple use (a,).

import numpy as np
from scipy.optimize import minimize
from numdifftools import Jacobian, Hessian

def fun(x, a):
    return (x[0] - 1) **2 + (x[1] - a) **2

def fun_der(x, a):
    return Jacobian(lambda x: fun(x, a))(x).ravel()

def fun_hess(x, a):
    return Hessian(lambda x: fun(x, a))(x)

x0 = np.array([2, 0]) # initial guess
a = 2.5

res = minimize(fun, x0, args=(a,), method='dogleg', jac=fun_der, hess=fun_hess)
print(res)
查看更多
Ridiculous、
3楼-- · 2020-03-02 00:14

I get that this is a toy example, but I would like to point out that using a tool like Jacobian or Hessian to calculate the derivatives instead of deriving the function itself is fairly costly. For example with your method:

x0 = np.array([2, 0])
a = 2.5
%timeit minimize(fun, x0, args=(a,), method='dogleg', jac=fun_der, hess=fun_hess)
100 loops, best of 3: 13.6 ms per loop

But you could calculate the derivative functions as such:

def fun_der(x, a):
    dx = 2 * (x[0] - 1)
    dy = 2 * (x[1] - a)
    return np.array([dx, dy]

def fun_hess(x, a):
    dx = 2
    dy = 2
    return np.diag([dx, dy])

%timeit minimize(fun, x0, args=(a,), method='dogleg', jac=fun_der, hess=fun_hess)
1000 loops, best of 3: 279 µs per loop

As you can see that is almost 50x faster. It really starts to add up with complex functions. As such I always try to derive the functions explicitly myself, regardless of how difficult that may be. One fun example is the kernel based implementation of Inductive Matrix Completion.

argmin --> sum((A - gamma_u(X) Z gamma_v(Y))**2 - lambda * ||Z||**2)
where gamma_u = (1/sqrt(m_x)) * [cos(UX), sin(UX)] and
gamma_v = (1/sqrt(m_y)) * [cos(VY), sin(VY)]
X.shape = n_x, p; Y.shape = n_y, q; U.shape = m_x, p; V.shape = m_y, q; Z.shape = 2m_x, 2m_y

Calculating the gradient and hessian from this equation is extremely unreasonable in comparison to explicitly deriving and utilizing those functions. So as @bnaul pointed out, if your function does have closed form derivates you really do want to calculate and use them.

查看更多
登录 后发表回答