In the Convolutional neural network,
I'm working on training a CNN, and during the training process, especially at the beginning of my training I get extremely high training error. After that, this error starts to go down slowly. After approximately 500 Epochs the training error comes near to zero (e.g. 0.006604). Then, I took the final obtained model to measure its accuracy against the testing data, I've got about 89.50%.
Does that seem normal? I mean getting a high training error rate at the very beginning of my training process.
Another thing I'd like to mention is that I've noticed that every time I decrease the number of hidden nodes, the results become better at the end of my training.
My CNN structure is:
config.forward_pass_scheme = {'conv_v', 'pool', 'conv_v', 'pool', 'conv_v', 'pool', 'conv_v','full', 'full', 'full', 'out'};
Here are some of my hyper parameters:
config.learning_rate = 0.01;
config.weight_range = 2;
config.decay = 0.0005;
config.normalize_init_weights = 1;
config.dropout_full_layer = 1;
config.optimization = 'adagrad';
Your help and suggestion in this regard is highly appreciated, thank you in advance.
If you have a large number of hidden units in fully connected (fc) layers and do not have sufficient training data, the network will overfit to the training set. Convolutional layers are less prone to overfitting as they have less parameters. Reducing the number of hidden units in the fc layers can reduce overfitting. To tune these hyper-parameters (like number of hidden nodes in a fc layer), a validation set is used, so that the model gives good performance on the test set. Although dropout helps in reducing overfitting in fc layers, it may not be sufficient if you add too many hidden units.
Yes, in the beginning, it is expected that training error would be high. CNNs are trained using stochastic optimization, so it takes some time to learn the parameters.
The more hidden layers/neurons, the more weights in your NN. It means more parameters to model your problem. Therefore it makes sense that add neurons/layers gives better learning.
On the other way, more weights to train means more risks of overfitting. Be sure that you have at least 5 times more samples in your training set that you have weights in your NN.
An other solution to prevent overfitting is to use a dropout.
Btw, which library do you use?