What does number of hidden layers in a multilayer perceptron neural network do to the way neural network behaves? Same question for number of nodes in hidden layers?
Let's say I want to use a neural network for hand written character recognition. In this case I put pixel colour intensity values as input nodes, and character classes as output nodes.
How would I choose number of hidden layers and nodes to solve such problem?
Note: this answer was correct at the time it was made, but has since become outdated.
It is rare to have more than two hidden layers in a neural network. The number of layers will usually not be a parameter of your network you will worry much about.
Bengio, Y. & LeCun, Y., 2007. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, (1), pp.1-41.
The cited paper is a good reference for learning about the effect of network depth, recent progress in teaching deep networks, and deep learning in general.
The general answer is to for picking hyperparameters is to cross-validate. Hold out some data, train the networks with different configurations, and use the one that performs best on the held out set.
Besides the fact that cross-validation on different model configurations(no. of hidden layers OR neurons per layer) will lead you to choose better configuration.
One approach is training a model, as big and deep as possible and use dropout regularization to turn off some neurons and reduce overfitting.
the reference to this approach can be seen in this paper. https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
Most of the problems I have seen were solved with 1-2 hidden layers. It is proven that MLPs with only one hidden layer are universal function approximators (Hornik et. al.). More hidden layers can make the problem easier or harder. You usually have to try different topologies. I heard that you cannot add an arbitrary number of hidden layers if you want to train your MLP with backprop because the gradient will become too small in the first layers (I have no reference for that). But there are some applications where people used up to nine layers. Maybe you are interested in a standard benchmark problem which is solved by different classifiers and MLP topologies.
All the above answers are of course correct but just to add some more ideas: Some general rules are the following based on this paper: 'Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture' by Saurabh Karsoliya.
In general:
Keep always in mind that you need to explore and try a lot of different combinations. Also, using
GridSearch
you could find the "best model and parameters".E.g. we can do a GridSearch in order to determine the "best" size of the hidden layer.