Can you clarify for me and people who need to understand the definition of non-trainable params in a model?
For example, while you are building your own model, its value is 0 as a default but when you want to use an inception model, it is becoming something else rather than 0. What would be the reason behind it?
Thank you very much for your clarification in advance.
It is clear that if you freeze any layer of the network. all params on that frozen layer turn to non-trainable. On the other hand if you design your network from the scratch, it might have some non-trainable parameters too. For instance batchnormalization layer has 4 parameter which are;
[gamma weights, beta weights, moving_mean, moving_variance]
The first two of them are trainable but last two are not. So the batch normalization layer is highly probably the reason that your custom network has non-trainable paramteres.
Non-trainable parameters are quite a broad subject. A straightforward example is to consider the case of any specific NN model and its architecture.
Say we have already setup your network definition in Keras, and your architecture is something like
256->500->500->1
. Based on this definition, we seem to have a Regression Model (one output) with two hidden layers (500 nodes each) and an input of 256.One non-trainable parameters of your model is, for example, the number of hidden layers itself (2). Other could be the nodes on each hidden layer (500 in this case), or even the nodes on each individual layer, giving you one parameter per layer plus the number of layers itself.
These parameters are "non-trainable" because you can't optimize its value with your training data. Training algorithms (like back-propagation) will optimize and update the weights of your network, which are the actual trainable parameters here (usually several thousands, depending on your connections). Your training data as it is can't help you determine those non-trainable parameters.
However, this does not mean that
numberHiddenLayers
is not trainable at all, it only means that in this model and its implementation we are unable to do so. We could makenumberHiddenLayers
trainable; the easiest way would be to define another ML algorithm that takes this model as input and trains it with several values ofnumberHiddenLayers
. The best value is obtained with the model that outperformed the others, thus optimizing thenumberHiddenLayers
variable.In other words, non-trainable parameters of a model are those that you will not be updating and optimized during training, and that have to be defined a priori, or passed as inputs.
There are some details that other answers do not cover.
In Keras, non-trainable parameters are the ones that are not trained using gradient descent. This is also controlled by the
trainable
parameter in each layer, for example:This prints zero trainable parameters, and 1010 non-trainable parameters.
Now if you set the layer as trainable with
model.layers[0].trainable = True
then it prints:Now all parameters are trainable and there are zero non-trainable parameters. But there are also layers that have both trainable and non-trainable parameters, one example is the
BatchNormalization
layer, where the mean and standard deviation of the activations is stored for use while test time. One example:This specific case of BatchNormalization has 40 parameters in total, 20 trainable, and 20 non-trainable. The 20 non-trainable parameters correspond to the computed mean and standard deviation of the activations that is used during test time, and these parameters will never be trainable using gradient descent, and are not affected by the
trainable
flag.In keras, non-trainable parameters (as shown in
model.summary()
) means the number of weights that you have chosen to keep constant when training.This means that keras won't update these weights during training.
Weights are the values inside the network that perform the operations and can be adjusted to result in what we want. The backpropagation algorithm changes the weights towards a lower error at the end.
By default, all weights in a keras model are trainable.
When you create layers, internally it creates its own weights and they're trainable. (The backpropagation algorithm will update these weights)
When you make them untrainable, the algorithm will not update these weights anymore. This is useful, for instance, when you want a convolutional layer with a specific filter, like a Sobel filter, for instance. You don't want the training to change this operation, so these weights/filters should be kept constant.
There is a lot of other reasons why you might want to make weights untrainable.
Changing parameters:
For deciding whether weights are trainable or not, you take layers from the model and set
trainable
:This must be done before compilation.