I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation.
But instead of converging, the it is diverging (the output is becoming infinity)
Here are some more details about what I've tried:
- I had set the initial weights to 0, but since it was diverging I have randomized the initial weights
- I read that a neural network might diverge if the learning rate is too high so I reduced the learning rate to 0.000001
- The two functions I am trying to get it to add are:
3 * i + 7 * j+9
andj*j + i*i + 24
(I am giving the layeri
andj
as input) - I had implemented it as a single layer previously and that could approximate the polynomial functions better
- I am thinking of implementing momentum in this network but I'm not sure it would help it learn
- I am using a linear (as in no) activation function
- There is oscillation in the beginning but the output starts diverging the moment any of weights become greater than 1
I have checked and rechecked my code but there doesn't seem to be any kind of issue with it.
So here's my question: what is going wrong here?
Any pointer will be appreciated.
The most common reason for a neural network code to diverge is that the coder has forgotten to put the negative sign in the change in weight expression.
another reason could be that there is a problem with the error expression used for calculating the gradients.
if these don't hold, then we need to see the code and answer.
If the problem you are trying to solve is of classification type, try 3 layer network (3 is enough accordingly to Kolmogorov) Connections from inputs A and B to hidden node C (C = A*wa + B*wb) represent a line in AB space. That line divides correct and incorrect half-spaces. The connections from hidden layer to ouput, put hidden layer values in correlation with each other giving you the desired output.
Depending on your data, error function may look like a hair comb, so implementing momentum should help. Keeping learning rate at 1 proved optimum for me.
Your training sessions will get stuck in local minima every once in a while, so network training will consist of a few subsequent sessions. If session exceeds max iterations or amplitude is too high, or error is obviously high - the session has failed, start another.
At the beginning of each, reinitialize your weights with random (-0.5 - +0.5) values.
It really helps to chart your error descent. You will get that "Aha!" factor.