What is the correct way to perform gradient clipping in pytorch?
I have an exploding gradients problem, and I need to program my way around it.
What is the correct way to perform gradient clipping in pytorch?
I have an exploding gradients problem, and I need to program my way around it.
Reading through the forum discussion gave this:
I'm sure there is more depth to it than only this code snippet.
clip_grad_norm
(which is actually deprecated in favor ofclip_grad_norm_
following the more consistent syntax of a trailing_
when in-place modification is performed) clips the norm of the overall gradient by concatenating all parameters passed to the function, as can be seen from the documentation:From your example it looks like that you want
clip_grad_value_
instead which has a similar syntax and also modifies the gradients in-place:Another option is to register a backward hook. This takes the current gradient as an input and may return a tensor which will be used in-place of the previous gradient, i.e. modifying it. This hook is called each time after a gradient has been computed, i.e. there's no need for manually clipping once the hook has been registered:
A more complete example
Source: https://github.com/pytorch/pytorch/issues/309