I created an Octave script for training a neural network with 1 hidden layer using backpropagation but it can not seem to fit an XOR function.
x
Input 4x2 matrix[0 0; 0 1; 1 0; 1 1]
y
Output 4x1 matrix[0; 1; 1; 0]
theta
Hidden / output layer weightsz
Weighted sumsa
Activation function applied to weighted sumsm
Sample count (4
here)
My weights are initialized as follows
epsilon_init = 0.12;
theta1 = rand(hiddenCount, inputCount + 1) * 2 * epsilon_init * epsilon_init;
theta2 = rand(outputCount, hiddenCount + 1) * 2 * epsilon_init * epsilon_init;
Feed forward
a1 = x;
a1_with_bias = [ones(m, 1) a1];
z2 = a1_with_bias * theta1';
a2 = sigmoid(z2);
a2_with_bias = [ones(size(a2, 1), 1) a2];
z3 = a2_with_bias * theta2';
a3 = sigmoid(z3);
Then I compute the logistic cost function
j = -sum((y .* log(a3) + (1 - y) .* log(1 - a3))(:)) / m;
Back propagation
delta2 = (a3 - y);
gradient2 = delta2' * a2_with_bias / m;
delta1 = (delta2 * theta2(:, 2:end)) .* sigmoidGradient(z2);
gradient1 = delta1' * a1_with_bias / m;
The gradients were verified to be correct using gradient checking.
I then use these gradients to find the optimal values for theta using gradient descent, though using Octave's fminunc
function yields the same results. The cost function converges to ln(2)
(or 0.5
for a squared errors cost function) because the network outputs 0.5
for all four inputs no matter how many hidden units I use.
Does anyone know where my mistake is?