I've been watching some videos on deep learning/convolutional neural networks, like here and here, and I tried to implement my own in C++. I tried to keep the input data fairly simple for my first attempt so the idea is to differentiate between a cross and a circle, I have a small data set of around 25 of each (64*64 images), they look like this:
The network itself is five layers:
Convolution (5 filters, size 3, stride 1, with a ReLU)
MaxPool (size 2)
Convolution (1 filter, size 3, stride 1, with a ReLU)
MaxPool (size 2)
Linear Regression classifier
My issue is that my network isn't converging, on anything. None of the weights appear to change. If I run it the predictions mostly stay the same other than the occasional outlier which jumps up before returning on the next iteration.
The convolutional layer training looks something like this, removed some loops to make it cleaner
// Yeah, I know I should change the shared_ptr<float>
void ConvolutionalNetwork::Train(std::shared_ptr<float> input,std::shared_ptr<float> outputGradients, float label)
{
float biasGradient = 0.0f;
// Calculate the deltas with respect to the input.
for (int layer = 0; layer < m_Filters.size(); ++layer)
{
// Pseudo-code, each loop on it's own line in actual code
For z < depth, x <width - filterSize, y < height -filterSize
{
int newImageIndex = layer*m_OutputWidth*m_OutputHeight+y*m_OutputWidth + x;
For the bounds of the filter (U,V)
{
// Find the index in the input image
int imageIndex = x + (y+v)*m_OutputWidth + z*m_OutputHeight*m_OutputWidth;
int kernelIndex = u +v*m_FilterSize + z*m_FilterSize*m_FilterSize;
m_pGradients.get()[imageIndex] += outputGradients.get()[newImageIndex]*input.get()[imageIndex];
m_GradientSum[layer].get()[kernelIndex] += m_pGradients.get()[imageIndex] * m_Filters[layer].get()[kernelIndex];
biasGradient += m_GradientSum[layer].get()[kernelIndex];
}
}
}
// Update the weights
for (int layer = 0; layer < m_Filters.size(); ++layer)
{
For z < depth, U & V < filtersize
{
// Find the index in the input image
int kernelIndex = u +v*m_FilterSize + z*m_FilterSize*m_FilterSize;
m_Filters[layer].get()[kernelIndex] -= learningRate*m_GradientSum[layer].get()[kernelIndex];
}
m_pBiases.get()[layer] -= learningRate*biasGradient;
}
}
So, I create a buffer (m_pGradients) which is the dimensions of the input buffer to feed the gradients back to the previous layer but use the gradient sum to adjust the weights.
The max pooling calculates the gradients back like so (it saves the max indices and zeros all the other gradients out)
void MaxPooling::Train(std::shared_ptr<float> input,std::shared_ptr<float> outputGradients, float label)
{
for (int outputVolumeIndex = 0; outputVolumeIndex <m_OutputVolumeSize; ++outputVolumeIndex)
{
int inputIndex = m_Indices.get()[outputVolumeIndex];
m_pGradients.get()[inputIndex] = outputGradients.get()[outputVolumeIndex];
}
}
And the final regression layer calculates its gradients like this:
void LinearClassifier::Train(std::shared_ptr<float> data,std::shared_ptr<float> output, float y)
{
float * x = data.get();
float biasError = 0.0f;
float h = Hypothesis(output) - y;
for (int i =1; i < m_NumberOfWeights; ++i)
{
float error = h*x[i];
m_pGradients.get()[i] = error;
biasError += error;
}
float cost = h;
m_Error = cost*cost;
for (int theta = 1; theta < m_NumberOfWeights; ++theta)
{
m_pWeights.get()[theta] = m_pWeights.get()[theta] - learningRate*m_pGradients.get()[theta];
}
m_pWeights.get()[0] -= learningRate*biasError;
}
After 100 iterations of training on the two examples the prediction on each is the same as the other and unchanged from the start.
- Should a convolutional network like this be able to discriminate between the two classes?
- Is this the correct approach?
- Should I be accounting for the ReLU (max) in the convolution layer backpropagation?