I have gone through many codes in stack overflow and made my own on same line. there is some problem with this code I am unable to understand. I am storing the value theta1 and theta 2 and also the cost function for analysis purpose.
The data for x and Y can be downloaded from this
Openclassroom page. It has x and Y data in form of .dat files that you can open in notepad.
%Single Variate Gradient Descent Algorithm%%
clc
clear all
close all;
% Step 1 Load x series/ Input data and Output data* y series
x=load('D:\Office Docs_Jay\software\ex2x.dat');
y=load('D:\Office Docs_Jay\software\ex2y.dat');
%Plot the input vectors
plot(x,y,'o');
ylabel('Height in meters');
xlabel('Age in years');
% Step 2 Add an extra column of ones in input vector
[m n]=size(x);
X=[ones(m,1) x];%Concatenate the ones column with x;
% Step 3 Create Theta vector
theta=zeros(n+1,1);%theta 0,1
% Create temporary values for storing summation
temp1=0;
temp2=0;
% Define Learning Rate alpha and Max Iterations
alpha=0.07;
max_iterations=1;
% Step 4 Iterate over loop
for i=1:1:max_iterations
%Calculate Hypothesis for all training example
for k=1:1:m
h(k)=theta(1,1)+theta(2,1)*X(k,2); %#ok<AGROW>
temp1=temp1+(h(k)-y(k));
temp2=temp2+(h(k)-y(k))*X(k,2);
end
% Simultaneous Update
tmp1=theta(1,1)-(alpha*1/(2*m)*temp1);
tmp2=theta(2,1)-(alpha*(1/(2*m))*temp2);
theta(1,1)=tmp1;
theta(2,1)=tmp2;
theta1_history(i)=theta(2,1); %#ok<AGROW>
theta0_history(i)=theta(1,1); %#ok<AGROW>
% Step 5 Calculate cost function
tmp3=0;
tmp4=0;
for p=1:m
tmp3=tmp3+theta(1,1)+theta(2,1)*X(p,1);
tmp4=tmp4+theta(1,1)+theta(2,1)*X(p,2);
end
J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>
end
theta
hold on;
plot(X(:,2),theta(1,1)+theta(2,1)*X);
I am getting the value of
theta as
0.0373
and 0.1900 it should be 0.0745 and 0.3800
this value is approximately double that I am expecting.
I have been trying to implement the iterative step with matrices and vectors (i.e not update each parameter of theta).
Here is what I came up with (only the gradient step is here):
h = X * theta; # hypothesis
err = h - y; # error
gradient = alpha * (1 / m) * (X' * err); # update the gradient
theta = theta - gradient;
The hard part to grasp is that the "sum" in the gradient step of the previous examples is actually performed by the matrix multiplication X'*err
.
You can also write it as (err'*X)'
I managed to create an algorithm that uses more of the vectorized properties that Matlab support.
My algorithm is a little different from yours but does the gradient descent process as you ask.
After the execution and validation (using polyfit function) that i made, i think that the values in openclassroom (exercise 2) that are expected in variables theta(0) = 0.0745 and theta(1) = 0.3800 are wrong after 1500 iterations with step 0.07 (i do not take response of that). This is the reason that i plotted my results with the data in one plot and your required results with the data in another plot and i saw a big difference in data fitting procedure.
First of all have a look at the code :
% Machine Learning : Linear Regression
clear all; close all; clc;
%% ======================= Plotting Training Data =======================
fprintf('Plotting Data ...\n')
x = load('ex2x.dat');
y = load('ex2y.dat');
% Plot Data
plot(x,y,'rx');
xlabel('X -> Input') % x-axis label
ylabel('Y -> Output') % y-axis label
%% =================== Initialize Linear regression parameters ===================
m = length(y); % number of training examples
% initialize fitting parameters - all zeros
theta=zeros(2,1);%theta 0,1
% Some gradient descent settings
iterations = 1500;
Learning_step_a = 0.07; % step parameter
%% =================== Gradient descent ===================
fprintf('Running Gradient Descent ...\n')
%Compute Gradient descent
% Initialize Objective Function History
J_history = zeros(iterations, 1);
m = length(y); % number of training examples
% run gradient descent
for iter = 1:iterations
% In every iteration calculate hypothesis
hypothesis=theta(1).*x+theta(2);
% Update theta variables
temp0=theta(1) - Learning_step_a * (1/m)* sum((hypothesis-y).* x);
temp1=theta(2) - Learning_step_a * (1/m) *sum(hypothesis-y);
theta(1)=temp0;
theta(2)=temp1;
% Save objective function
J_history(iter)=(1/2*m)*sum(( hypothesis-y ).^2);
end
% print theta to screen
fprintf('Theta found by gradient descent: %f %f\n',theta(1), theta(2));
fprintf('Minimum of objective function is %f \n',J_history(iterations));
% Plot the linear fit
hold on; % keep previous plot visible
plot(x, theta(1)*x+theta(2), '-')
% Validate with polyfit fnc
poly_theta = polyfit(x,y,1);
plot(x, poly_theta(1)*x+poly_theta(2), 'y--');
legend('Training data', 'Linear regression','Linear regression with polyfit')
hold off
figure
% Plot Data
plot(x,y,'rx');
xlabel('X -> Input') % x-axis label
ylabel('Y -> Output') % y-axis label
hold on; % keep previous plot visible
% Validate with polyfit fnc
poly_theta = polyfit(x,y,1);
plot(x, poly_theta(1)*x+poly_theta(2), 'y--');
% for theta values that you are saying
theta(1)=0.0745; theta(2)=0.3800;
plot(x, theta(1)*x+theta(2), 'g--')
legend('Training data', 'Linear regression with polyfit','Your thetas')
hold off
Ok the results are as follows :
With theta(0) and theta(1) that produced from my algorithm as a result the line fits the data.
With theta(0) and theta(1) as fixed values as a result the line do not fit the data.
You need put
temp1=0 temp2=0
as the first comment in iteration loop;
Cause if you don't, your current temp will influence next iteration, tht's wrong
From the values of Ɵ
(theta) of your expectation and the program's outcome, one thing can be noticed that the expected value is twice that of the outcome.
The possible mistake you made is you used 1/(2*m)
in place of 1/m
in the code of derivative calculation. In the derivative the 2
of denominator vanishes as the original term was (hƟ(x) - y)2 which on differentiation generates 2*(hƟ(x) - y).
The 2s cancel out.
Modify these code lines:
J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>
to
J1_theta0(i)=tmp3*(1/m); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/m); %#ok<AGROW>
Hope it helps.