Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 4 years ago.
Improve this question
The findHomography() function in OpenCV finds a perspective transformation between two planes.The computed homography matrix is refined further (using inliers only in case of a robust method) with the Levenberg-Marquardt method to reduce the re-projection error even more. Can anyone provide any links to C/C++ code to the Levenberg Marquardt algorithm which is required to minimize the error function, as it will help me to understand the mathematics behind the algorithm. (Everywhere in the net ,only libraries or specific codes have been uploaded but not the detailed code to the algorithm).
I know it's not C/C++, but the file on the Matlab file exchange might have source code that could help you:
http://www.mathworks.com/matlabcentral/fileexchange/16063
If you understand C/C++ you will probably have no problem understanding Matlab, especially if you're looking at source code for the purposes of furthering your understanding.
The openCV code is open so check modules. The cheatsheet for openCV has Levenberg Marquardt formulation, see column 3 page 1. You can use grep to search for that cheatsheet formulation or just for the name of the method:
grep -rnw . -e "Levenberg"
For example, you can discover a lot of levmar*.c files in the modules/legacy folder. The reason for using LMA after solving a system of linear equations with RANSAC is because the former linear equations optimize a wrong metrics - an algebraic error in parameters of Homography. A better metrics is the sum of squared residual in terms of point coordinates. In this metrics the solution is non-linear and has to be found iteratively. The linear solution is used to initialize a non-linear algorithm and increase its chances to converge to a global minima.
Seeing the code, however, is not always the easiest way to understand a complex concept. Here is an 'evolutionary' line up of optimization algorithms that hopefully may facilitate your understanding:
- Optimization can always be cast as minimization of cost or residuals
If the function convex (a bowl shape) it has only one global minima and it can be found by sliding down. Since gradient points up we take it with a minus sign:
param[t+1]=param[t]-k*cost', where t-iteration, k-step and ' stands for a gradient or a derivative w.r.t. parameters
So far so good but in some cases (imagine a cost function that looks like a snowboarding tube) going along gradient will make you overshoot and zig-zag from one side of the valley to the other while only slowly going down (to global min). Solution - use a better function approximation (Taylor expansion up to a second derivative) to arrive at this
param[t+1]=param[t]-(k/cost'') *cost', where '' is second derivative
Intuitively fast function means large second derivative thus we slow down convergence by reducing our step (dividing it by a large number) and vise versa.
4. For quadratic error which often represents a sum of squared residuals or some distances we can come up with an approximation of the second derivative as a product of first derivatives (called Jacobians shown as J which is generally a matrix); second derivative or Hessian = JTJ, so optimization looks like
param[t+1]=param[t]-(JTJ)-1 *cost' - called Gauss-Newton
Here we dropped k as a step for simplicity; Compare this and previous formulas, we are almost there! Also note, in wikipedia and other sites you may see that instead of k they use a residual yi-f(x, param) but you can ignore it for now as well. Go further only if you are comfortable with the last formulation.
5. Here comes Levenberg and says - it is great but may be unstable. We assumed too much with those second derivatives and it would be nice to dump them and make things look like first derivative as before. To do that we add a dumping factor that can make optimization look like
param[t+1]=param[t]-(JTJ + lambda*I)-1 *cost', where I is identity matrix and lambda is a scalar
Note that when lambda is large you can discard JTJ and you are left with standard gradient descent. They do it when convergence slows down, otherwice they use small lambda which makes the algorithm look more like Gauss-Newton. Note that J = d_z/d_param, cost’=d_f/d_param and f=zT*z
Here comes Marquardt and says, great but I like our gradient descent go faster - for small diagonal(JTJ) I'd like larger step, hence a final formula where we devide by small diagonal elements resulting in faster convergence when gradient is slow:
param[t+1]=param[t]-(JTJ + lambda*diagonal(JTJ))-1 *cost'
Here is a quick summary using downhill skiing analogy where we show the expression for parameter change param[t+1]-param[t]:
1. Gradient J points up but you want to ski down so go against gradient -kJ;
2. If zig-zagging in a 'snow tube' move in steps inverse to a second derivative (H); you will hopefully go faster straight down: -H-1J
3. Approximate a second derivative with the product of first derivatives -(JTJ)-1J ;
4. Dump it if it is too much and go back to a first derivative -(JTJ + lambdaI)-1 J;
5. don't want to get stuck on a flat slope? scale your step inversely to diagonal of approximated Hessian: -(JTJ + lambdadiag(JTJ))-1 J
It is a big mystery why all these work since my intuition gives up after a while. May be it is a good time to look at this philosophically and apply a principle of Occam's Razor - the less we assume the better we off but if assume too little we may not get too far. All we tried to do is to predict (assume) the global behaviour of the function looking at it locally (imaging getting married after being together for 1 week). Using Hessian is like having more assumptions in terms of its numerous parameters as opposed to more conservative Jacobian that has a few parameters (assumptions). Finally we may want to be flexible in being either conservative or risky and that's what Levenberg-Marquardt methods tries to accomplish.