Gradient Descent Matlab implementation

2019-04-09 20:33发布

问题:

I have gone through many codes in stack overflow and made my own on same line. there is some problem with this code I am unable to understand. I am storing the value theta1 and theta 2 and also the cost function for analysis purpose. The data for x and Y can be downloaded from this Openclassroom page. It has x and Y data in form of .dat files that you can open in notepad.

    %Single Variate Gradient Descent Algorithm%%
    clc
clear all
close all;
% Step 1 Load x series/ Input data and Output data* y series

x=load('D:\Office Docs_Jay\software\ex2x.dat');
y=load('D:\Office Docs_Jay\software\ex2y.dat');
%Plot the input vectors
plot(x,y,'o');
ylabel('Height in meters');
xlabel('Age in years');

% Step 2 Add an extra column of ones in input vector
[m n]=size(x);
X=[ones(m,1) x];%Concatenate the ones column with x;
% Step 3 Create Theta vector
theta=zeros(n+1,1);%theta 0,1
% Create temporary values for storing summation

temp1=0;
temp2=0;
% Define Learning Rate alpha and Max Iterations

alpha=0.07;
max_iterations=1;
      % Step 4 Iterate over loop
      for i=1:1:max_iterations

     %Calculate Hypothesis for all training example
     for k=1:1:m
        h(k)=theta(1,1)+theta(2,1)*X(k,2); %#ok<AGROW>
        temp1=temp1+(h(k)-y(k));
        temp2=temp2+(h(k)-y(k))*X(k,2);
     end
     % Simultaneous Update
      tmp1=theta(1,1)-(alpha*1/(2*m)*temp1);
      tmp2=theta(2,1)-(alpha*(1/(2*m))*temp2);
      theta(1,1)=tmp1;
      theta(2,1)=tmp2;
      theta1_history(i)=theta(2,1); %#ok<AGROW>
      theta0_history(i)=theta(1,1); %#ok<AGROW>
      % Step 5 Calculate cost function
      tmp3=0;
      tmp4=0;
      for p=1:m
        tmp3=tmp3+theta(1,1)+theta(2,1)*X(p,1);
        tmp4=tmp4+theta(1,1)+theta(2,1)*X(p,2);
      end
      J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
      J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>


      end
      theta
      hold on;
      plot(X(:,2),theta(1,1)+theta(2,1)*X);

I am getting the value of

theta as 0.0373 and 0.1900 it should be 0.0745 and 0.3800

this value is approximately double that I am expecting.

回答1:

I have been trying to implement the iterative step with matrices and vectors (i.e not update each parameter of theta). Here is what I came up with (only the gradient step is here):

h = X * theta;  # hypothesis
err = h - y;    # error
gradient = alpha * (1 / m) * (X' * err); # update the gradient
theta = theta - gradient;

The hard part to grasp is that the "sum" in the gradient step of the previous examples is actually performed by the matrix multiplication X'*err. You can also write it as (err'*X)'



回答2:

I managed to create an algorithm that uses more of the vectorized properties that Matlab support. My algorithm is a little different from yours but does the gradient descent process as you ask. After the execution and validation (using polyfit function) that i made, i think that the values in openclassroom (exercise 2) that are expected in variables theta(0) = 0.0745 and theta(1) = 0.3800 are wrong after 1500 iterations with step 0.07 (i do not take response of that). This is the reason that i plotted my results with the data in one plot and your required results with the data in another plot and i saw a big difference in data fitting procedure.

First of all have a look at the code :

% Machine Learning : Linear Regression

clear all; close all; clc;

%% ======================= Plotting Training Data =======================
fprintf('Plotting Data ...\n')

x = load('ex2x.dat');
y = load('ex2y.dat');

% Plot Data
plot(x,y,'rx');
xlabel('X -> Input') % x-axis label
ylabel('Y -> Output') % y-axis label

%% =================== Initialize Linear regression parameters ===================
 m = length(y); % number of training examples

% initialize fitting parameters - all zeros
theta=zeros(2,1);%theta 0,1

% Some gradient descent settings
iterations = 1500;
Learning_step_a = 0.07; % step parameter

%% =================== Gradient descent ===================

fprintf('Running Gradient Descent ...\n')

%Compute Gradient descent

% Initialize Objective Function History
J_history = zeros(iterations, 1);

m = length(y); % number of training examples

% run gradient descent    
for iter = 1:iterations

   % In every iteration calculate hypothesis
   hypothesis=theta(1).*x+theta(2);

   % Update theta variables
   temp0=theta(1) - Learning_step_a * (1/m)* sum((hypothesis-y).* x);
   temp1=theta(2) - Learning_step_a * (1/m) *sum(hypothesis-y);

   theta(1)=temp0;
   theta(2)=temp1;

   % Save objective function 
   J_history(iter)=(1/2*m)*sum(( hypothesis-y ).^2);

end

% print theta to screen
fprintf('Theta found by gradient descent: %f %f\n',theta(1),  theta(2));
fprintf('Minimum of objective function is %f \n',J_history(iterations));

% Plot the linear fit
hold on; % keep previous plot visible 
plot(x, theta(1)*x+theta(2), '-')

% Validate with polyfit fnc
poly_theta = polyfit(x,y,1);
plot(x, poly_theta(1)*x+poly_theta(2), 'y--');
legend('Training data', 'Linear regression','Linear regression with polyfit')
hold off 

figure
% Plot Data
plot(x,y,'rx');
xlabel('X -> Input') % x-axis label
ylabel('Y -> Output') % y-axis label

hold on; % keep previous plot visible
% Validate with polyfit fnc
poly_theta = polyfit(x,y,1);
plot(x, poly_theta(1)*x+poly_theta(2), 'y--');

% for theta values that you are saying
theta(1)=0.0745;  theta(2)=0.3800;
plot(x, theta(1)*x+theta(2), 'g--')
legend('Training data', 'Linear regression with polyfit','Your thetas')
hold off 

Ok the results are as follows :

With theta(0) and theta(1) that produced from my algorithm as a result the line fits the data.

With theta(0) and theta(1) as fixed values as a result the line do not fit the data.



回答3:

Here are some comments:

  1. max_iterations is set to 1. Gradient descent is typically run until either the decrease in the objective function is below some threshold or the magnitude of the gradient is below some threshold, which would likely be more than one iteration.

  2. The factor of 1/(2*m) is not be technically correct. This should not cause the algorithm to fail, but will effectively decrease the learning rate.

  3. You are not computing the correct objective. The correct linear regression objective should either be one-half times the average of the squared residuals or one-half times the sum of the squared residuals.

  4. Rather than using for-loops you should take advantage of matlab's vectorized computations. For instance, res=X*theta-y; obj=.5/m*res'res; should compute the residuals (res) and the linear regression objective (obj).



回答4:

You need put temp1=0 temp2=0 as the first comment in iteration loop; Cause if you don't, your current temp will influence next iteration, tht's wrong



回答5:

From the values of Ɵ(theta) of your expectation and the program's outcome, one thing can be noticed that the expected value is twice that of the outcome.

The possible mistake you made is you used 1/(2*m) in place of 1/m in the code of derivative calculation. In the derivative the 2 of denominator vanishes as the original term was (hƟ(x) - y)2 which on differentiation generates 2*(hƟ(x) - y). The 2s cancel out.

Modify these code lines:

J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>

to

J1_theta0(i)=tmp3*(1/m); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/m); %#ok<AGROW>

Hope it helps.