I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation.
But instead of converging, the it is diverging (the output is becoming infinity)
Here are some more details about what I've tried:
- I had set the initial weights to 0, but since it was diverging I have randomized the initial weights
- I read that a neural network might diverge if the learning rate is too high so I reduced the learning rate to 0.000001
- The two functions I am trying to get it to add are:
3 * i + 7 * j+9
andj*j + i*i + 24
(I am giving the layeri
andj
as input) - I had implemented it as a single layer previously and that could approximate the polynomial functions better
- I am thinking of implementing momentum in this network but I'm not sure it would help it learn
- I am using a linear (as in no) activation function
- There is oscillation in the beginning but the output starts diverging the moment any of weights become greater than 1
I have checked and rechecked my code but there doesn't seem to be any kind of issue with it.
So here's my question: what is going wrong here?
Any pointer will be appreciated.