I have gone through many codes in stack overflow and made my own on same line. there is some problem with this code I am unable to understand. I am storing the value theta1 and theta 2 and also the cost function for analysis purpose. The data for x and Y can be downloaded from this Openclassroom page. It has x and Y data in form of .dat files that you can open in notepad.
%Single Variate Gradient Descent Algorithm%%
clc
clear all
close all;
% Step 1 Load x series/ Input data and Output data* y series
x=load('D:\Office Docs_Jay\software\ex2x.dat');
y=load('D:\Office Docs_Jay\software\ex2y.dat');
%Plot the input vectors
plot(x,y,'o');
ylabel('Height in meters');
xlabel('Age in years');
% Step 2 Add an extra column of ones in input vector
[m n]=size(x);
X=[ones(m,1) x];%Concatenate the ones column with x;
% Step 3 Create Theta vector
theta=zeros(n+1,1);%theta 0,1
% Create temporary values for storing summation
temp1=0;
temp2=0;
% Define Learning Rate alpha and Max Iterations
alpha=0.07;
max_iterations=1;
% Step 4 Iterate over loop
for i=1:1:max_iterations
%Calculate Hypothesis for all training example
for k=1:1:m
h(k)=theta(1,1)+theta(2,1)*X(k,2); %#ok<AGROW>
temp1=temp1+(h(k)-y(k));
temp2=temp2+(h(k)-y(k))*X(k,2);
end
% Simultaneous Update
tmp1=theta(1,1)-(alpha*1/(2*m)*temp1);
tmp2=theta(2,1)-(alpha*(1/(2*m))*temp2);
theta(1,1)=tmp1;
theta(2,1)=tmp2;
theta1_history(i)=theta(2,1); %#ok<AGROW>
theta0_history(i)=theta(1,1); %#ok<AGROW>
% Step 5 Calculate cost function
tmp3=0;
tmp4=0;
for p=1:m
tmp3=tmp3+theta(1,1)+theta(2,1)*X(p,1);
tmp4=tmp4+theta(1,1)+theta(2,1)*X(p,2);
end
J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>
end
theta
hold on;
plot(X(:,2),theta(1,1)+theta(2,1)*X);
I am getting the value of
theta as 0.0373 and 0.1900 it should be 0.0745 and 0.3800
this value is approximately double that I am expecting.
From the values of
Ɵ
(theta) of your expectation and the program's outcome, one thing can be noticed that the expected value is twice that of the outcome.The possible mistake you made is you used
1/(2*m)
in place of1/m
in the code of derivative calculation. In the derivative the2
of denominator vanishes as the original term was (hƟ(x) - y)2 which on differentiation generates 2*(hƟ(x) - y). The 2s cancel out.Modify these code lines:
to
Hope it helps.
I have been trying to implement the iterative step with matrices and vectors (i.e not update each parameter of theta). Here is what I came up with (only the gradient step is here):
The hard part to grasp is that the "sum" in the gradient step of the previous examples is actually performed by the matrix multiplication
X'*err
. You can also write it as(err'*X)'
You need put temp1=0 temp2=0 as the first comment in iteration loop; Cause if you don't, your current temp will influence next iteration, tht's wrong
Here are some comments:
max_iterations
is set to1
. Gradient descent is typically run until either the decrease in the objective function is below some threshold or the magnitude of the gradient is below some threshold, which would likely be more than one iteration.The factor of 1/(2*m) is not be technically correct. This should not cause the algorithm to fail, but will effectively decrease the learning rate.
You are not computing the correct objective. The correct linear regression objective should either be one-half times the average of the squared residuals or one-half times the sum of the squared residuals.
Rather than using for-loops you should take advantage of matlab's vectorized computations. For instance,
res=X*theta-y; obj=.5/m*res'res;
should compute the residuals (res
) and the linear regression objective (obj
).I managed to create an algorithm that uses more of the vectorized properties that Matlab support. My algorithm is a little different from yours but does the gradient descent process as you ask. After the execution and validation (using polyfit function) that i made, i think that the values in openclassroom (exercise 2) that are expected in variables theta(0) = 0.0745 and theta(1) = 0.3800 are wrong after 1500 iterations with step 0.07 (i do not take response of that). This is the reason that i plotted my results with the data in one plot and your required results with the data in another plot and i saw a big difference in data fitting procedure.
First of all have a look at the code :
Ok the results are as follows :
With theta(0) and theta(1) that produced from my algorithm as a result the line fits the data.
With theta(0) and theta(1) as fixed values as a result the line do not fit the data.