The method zero_grad()
needs to be called during training. But the documentation is not very helpful
| zero_grad(self)
| Sets gradients of all model parameters to zero.
Why do we need to call this method?
The method zero_grad()
needs to be called during training. But the documentation is not very helpful
| zero_grad(self)
| Sets gradients of all model parameters to zero.
Why do we need to call this method?
In
PyTorch
, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. This is convenient while training RNNs. So, the default action is to accumulate (i.e. sum) the gradients on everyloss.backward()
call.Because of this, when you start your training loop, ideally you should
zero out the gradients
so that you do the parameter update correctly. Else the gradient would point in some other direction than the intended direction towards the minimum (or maximum, in case of maximization objectives).Here is a simple example:
Alternatively, if you're doing a vanilla gradient descent, then:
Note: The accumulation (i.e. sum) of gradients happen when
.backward()
is called on theloss
tensor.