trying to wrap my head around how gradients are represented and how autograd works:
import torch
from torch.autograd import Variable
x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y
z.backward()
print(x.grad)
#Variable containing:
#32
#[torch.FloatTensor of size 1]
print(y.grad)
#None
Why does it not produce a gradient for y
? If y.grad = dz/dy
, then shouldn't it at least produce a variable like y.grad = 2*y
?
By default, gradients are only retained for leaf variables. non-leaf variables' gradients are not retained to be inspected later. This was
done by design, to save memory.
-soumith chintala
See: https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94
Option 1:
Call y.retain_grad()
x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y
y.retain_grad()
z.backward()
print(y.grad)
#Variable containing:
# 8
#[torch.FloatTensor of size 1]
Source: https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/16
Option 2:
Register a hook
, which is basically a function called when that gradient is calculated. Then you can save it, assign it, print it, whatever...
from __future__ import print_function
import torch
from torch.autograd import Variable
x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y
y.register_hook(print) ## this can be anything you need it to be
z.backward()
output:
Variable containing: 8 [torch.FloatTensor of size 1
Source: https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/2
Also see: https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/7