I have a reasonably simple constrained optimization problem but get different answers depending on how I do it. Let's get the import and a pretty print function out of the way first:
import numpy as np
from scipy.optimize import minimize, LinearConstraint, NonlinearConstraint, SR1
def print_res( res, label ):
print("\n\n ***** ", label, " ***** \n")
print(res.message)
print("obj func value at solution", obj_func(res.x))
print("starting values: ", x0)
print("ending values: ", res.x.astype(int) )
print("% diff", (100.*(res.x-x0)/x0).astype(int) )
print("target achieved?",target,res.x.sum())
The sample data is very simple:
n = 5
x0 = np.arange(1,6) * 10_000
target = x0.sum() + 5_000 # increase sum from 15,000 to 20,000
Here's the constrained optimization (including jacobians). In words, the objective function I want to minimize is just the sum of squared percentage changes from the initial values to final values. The linear equality constraint is simply requiring x.sum()
to equal a constant.
def obj_func(x):
return ( ( ( x - x0 ) / x0 ) ** 2 ).sum()
def obj_jac(x):
return 2. * ( x - x0 ) / x0 ** 2
def constr_func(x):
return x.sum() - target
def constr_jac(x):
return np.ones(n)
And for comparison, I've re-factored as an unconstrained minimization by using the equality constraint to replace x[0]
with a function of x[1:]
. Note that the unconstrained function is passed x0[1:]
whereas the constrained function is passed x0
.
def unconstr_func(x):
x_one = target - x.sum()
first_term = ( ( x_one - x0[0] ) / x0[0] ) ** 2
second_term = ( ( ( x - x0[1:] ) / x0[1:] ) ** 2 ).sum()
return first_term + second_term
I then try to minimize in three ways:
- Unconstrained with 'Nelder-Mead'
- Constrained with 'trust-constr' (w/ & w/o jacobian)
- Constrained with 'SLSQP' (w/ & w/o jacobian)
Code:
##### (1) unconstrained
res0 = minimize( unconstr_func, x0[1:], method='Nelder-Mead') # OK, but weird note
res0.x = np.hstack( [target - res0.x.sum(), res0.x] )
print_res( res0, 'unconstrained' )
##### (2a) constrained -- trust-constr w/ jacobian
nonlin_con = NonlinearConstraint( constr_func, 0., 0., constr_jac )
resTCjac = minimize( obj_func, x0, method='trust-constr',
jac='2-point', hess=SR1(), constraints = nonlin_con )
print_res( resTCjac, 'trust-const w/ jacobian' )
##### (2b) constrained -- trust-constr w/o jacobian
nonlin_con = NonlinearConstraint( constr_func, 0., 0. )
resTC = minimize( obj_func, x0, method='trust-constr',
jac='2-point', hess=SR1(), constraints = nonlin_con )
print_res( resTC, 'trust-const w/o jacobian' )
##### (3a) constrained -- SLSQP w/ jacobian
eq_cons = { 'type': 'eq', 'fun' : constr_func, 'jac' : constr_jac }
resSQjac = minimize( obj_func, x0, method='SLSQP',
jac = obj_jac, constraints = eq_cons )
print_res( resSQjac, 'SLSQP w/ jacobian' )
##### (3b) constrained -- SLSQP w/o jacobian
eq_cons = { 'type': 'eq', 'fun' : constr_func }
resSQ = minimize( obj_func, x0, method='SLSQP',
jac = obj_jac, constraints = eq_cons )
print_res( resSQ, 'SLSQP w/o jacobian' )
Here is some simplified output (and of course you can run the code to get the full output):
starting values: [10000 20000 30000 40000 50000]
***** (1) unconstrained *****
Optimization terminated successfully.
obj func value at solution 0.0045454545454545305
ending values: [10090 20363 30818 41454 52272]
***** (2a) trust-const w/ jacobian *****
The maximum number of function evaluations is exceeded.
obj func value at solution 0.014635854609684874
ending values: [10999 21000 31000 41000 51000]
***** (2b) trust-const w/o jacobian *****
`gtol` termination condition is satisfied.
obj func value at solution 0.0045454545462939935
ending values: [10090 20363 30818 41454 52272]
***** (3a) SLSQP w/ jacobian *****
Optimization terminated successfully.
obj func value at solution 0.014636111111111114
ending values: [11000 21000 31000 41000 51000]
***** (3b) SLSQP w/o jacobian *****
Optimization terminated successfully.
obj func value at solution 0.014636111111111114
ending values: [11000 21000 31000 41000 51000]
Notes:
(1) & (2b) are plausible solutions in that they achieve significantly lower objective function values and intuitively we'd expect the variables with larger starting values to move more (both absolutely and in percentage terms) than the smaller ones.
Adding the jacobian to 'trust-const' causes it to get the wrong answer (or at least a worse answer) and also to exceed max iterations. Maybe the jacobian is wrong, but the function is so simple that I'm pretty sure it's correct (?)
'SLSQP' doesn't seem to work w/ or w/o the jacobian supplied, but works very fast and claims to terminate successfully. This seems very worrisome in that getting the wrong answer and claiming to have terminated successfully is pretty much the worst possible outcome.
Initially I used very small starting values and targets (just 1/1,000 of what I have above) and in that case all 5 approaches above work fine and give the same answers. My sample data is still extremely small, and it seems kinda bizarre for it to handle
1,2,..,5
but not1000,2000,..5000
.FWIW, note that the 3 incorrect results all hit the target by adding 1,000 to each initial value -- this satisfies the constraint but comes nowhere near minimizing the objective function (b/c variables with higher initial values should be increased more than lower ones to minimize the sum of squared percentage differences).
So my question is really just what is happening here and why do only (1) and (2b) seem to work?
More generally, I'd like to find a good python-based approach to this and similar optimization problems and will consider answers using other packages besides scipy although the best answer would ideally also address what is going on with scipy here (e.g. is this user error or a bug I should post to github?).
This is a partial answer to the question that I'm putting here to keep the question from getting even bigger, but I'd still love to see a more comprehensive and explanatory answer. These answers are based on comments from two others, but neither of them fully wrote out the code, and I thought it would make sense to make that explicit so here it is:
Fixing 2a (trust-constr with jacobian)
It's seems that the key here with regard to the Jacobian and Hessian is to specify neither or both (but not the jacobian only). @SubhaneilLahiri commented to this effect and there was also an error message to this effect that I initially failed to notice:
So I fixed it by defining the hessian function:
and adding it to the constraint
Fixing 3a & 3b (SLSQP)
This just seemed to be a matter of making the tolerance smaller as suggested by @user545424. So I just added
options={'ftol':1e-15}
to the minimization:Here is how this problem could be solved using
nlopt
which is a library for nonlinear optimization which I've been pretty impressed with.First, the objective function and gradient are both defined using the same function:
Then, to run the minimization using Nelder-Mead and SLSQP:
And here are the results:
When testing this out, I think I discovered what the issue with the original attempt was. If I set the absolute tolerance on the function to
1e-8
which is what the scipy functions default to I get:which is exactly what you were seeing. So my guess is that the minimizer ends up somewhere in the likelihood space during SLSQP where the next jump is less than
1e-8
from the last place.