I want to make a model which predicts the future response of the input signal, the architecture of my network is [3, 5, 1]:
- 3 inputs,
- 5 neurons in the hidden layer, and
- 1 neuron in output layer.
My questions are:
- Should we have separate BIAS for each hidden and output layer?
- Should we assign weight to BIAS at each layer (as BIAS becomes extra value to our network and cause the over burden the network)?
- Why BIAS is always set to one? If eta has different values, why we don't set the BIAS with different values?
- Why we always use log sigmoid function for non linear functions, can we use tanh ?
So, I think it'd clear most of this up if we were to step back and discuss the role the bias unit is meant to play in a NN.
A bias unit is meant to allow units in your net to learn an appropriate threshold (i.e. after reaching a certain total input, start sending positive activation), since normally a positive total input means a positive activation.
For example if your bias unit has a weight of -2 with some neuron x, then neuron x will provide a positive activation if all other input adds up to be greater then -2.
So, with that as background, your answers:
- No, one bias input is always sufficient, since it can affect different neurons differently depending on its weight with each unit.
- Generally speaking, having bias weights going to every non-input unit is a good idea, since otherwise those units without bias weights would have thresholds that will always be zero.
- Since the threshold, once learned should be consistent across trials. Remember the bias represented how each unit interacts with the input; it isn't an input itself.
- You certainly can and many do. Any sqaushing function generally works as an activation function.