I have a vector X of 20 real numbers and a vector Y of 20 real numbers.
I want to model them as
y = ax^2+bx + c
How to find the value of 'a' , 'b' and 'c'
and best fit quadratic equation.
Given Values
X = (x1,x2,...,x20)
Y = (y1,y2,...,y20)
i need a formula or procedure to find following values
a = ???
b = ???
c = ???
Thanks in advance.
That is a linear least squares problem. I think the easiest method which gives accurate results is QR decomposition using Householder reflections. It is not something to be explained in a stackoverflow answer, but I hope you will find all that is needed with this links.
If you never heard about these before and don't know how it connects with you problem:
A = [[x1^2, x1, 1]; [x2^2, x2, 1]; ...]
Y = [y1; y2; ...]
Now you want to find v = [a; b; c]
such that A*v
is as close as possible to Y
, which is exactly what least squares problem is all about.
Everything @Bartoss said is right, +1. I figured I just add a practical implementation here, without QR decomposition. You want to evaluate the values of a,b,c such that the distance between measured and fitted data is minimal. You can pick as measure
sum(ax^2+bx + c -y)^2)
where the sum is over the elements of vectors x,y.
Then, a minimum implies that the derivative of the quantity with respect to each of a,b,c is zero:
d (sum(ax^2+bx + c -y)^2) /da =0
d (sum(ax^2+bx + c -y)^2) /db =0
d (sum(ax^2+bx + c -y)^2) /dc =0
these equations are
2(sum(ax^2+bx + c -y)*x^2)=0
2(sum(ax^2+bx + c -y)*x) =0
2(sum(ax^2+bx + c -y)) =0
Dividing by 2, the above can be rewritten as
a*sum(x^4) +b*sum(x^3) + c*sum(x^2) =sum(y*x^2)
a*sum(x^3) +b*sum(x^2) + c*sum(x) =sum(y*x)
a*sum(x^2) +b*sum(x) + c*N =sum(y)
where N=20
in your case. A simple code in python showing how to do so follows.
from numpy import random, array
from scipy.linalg import solve
import matplotlib.pylab as plt
a, b, c = 6., 3., 4.
N = 20
x = random.rand((N))
y = a * x ** 2 + b * x + c
y += random.rand((20)) #add a bit of noise to make things more realistic
x4 = (x ** 4).sum()
x3 = (x ** 3).sum()
x2 = (x ** 2).sum()
M = array([[x4, x3, x2], [x3, x2, x.sum()], [x2, x.sum(), N]])
K = array([(y * x ** 2).sum(), (y * x).sum(), y.sum()])
A, B, C = solve(M, K)
print 'exact values ', a, b, c
print 'calculated values', A, B, C
fig, ax = plt.subplots()
ax.plot(x, y, 'b.', label='data')
ax.plot(x, A * x ** 2 + B * x + C, 'r.', label='estimate')
ax.legend()
plt.show()
A much faster way to implement solution is to use a nonlinear least squares algorithm. This will be faster to write, but not faster to run. Using the one provided by scipy
,
from scipy.optimize import leastsq
def f(arg):
a,b,c=arg
return a*x**2+b*x+c-y
(A,B,C),_=leastsq(f,[1,1,1])#you must provide a first guess to start with in this case.