Pytorch - Getting gradient for intermediate variab

As an exercice in pytorch framework (0.4.1) , I am trying to display the gradient of X (gX or dSdX) in a simple Linear layer (Z = X.W + B). To simplify my toy example, I backward() from a sum of Z (not a loss).

To sum up, I want gX(dSdX) of S=sum(XW+B).

The problem is that the gradient of Z (dSdZ) is None. As a result, gX is wrong too of course.

import torch
X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

Result:

Z:
 tensor([[2.1500, 2.9100],
        [1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
 None
gX:
 tensor([[ 3.6000, -0.9000,  1.3000],
        [ 3.6000, -0.9000,  1.3000]])

I have exactly the same result if I use nn.Module as below:

class Net1Linear(torch.nn.Module):
    def __init__(self, wi, wo,W,B):
        super(Net1Linear, self).__init__()
        self.linear1 = torch.nn.Linear(wi, wo)
        self.linear1.weight = torch.nn.Parameter(W.t())
        self.linear1.bias = torch.nn.Parameter(B)
    def forward(self, x):
        return self.linear1(x)
net = Net1Linear(3,2,W,B)
Z = net(X)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

标签： artificial-intelligence pytorch gradient-descent

2条回答

等我变得足够好

2楼-- · 2019-08-01 06:15

blue-phoenox, thanks for your answer. I am pretty happy to have heard about register_hook().

What led me to think that I had a wrong gX is that it was independant of the values of X. I will have to do the math to understand it. But using CCE Loss instead of SUM makes things much more clean. So I updated the example for those who might be interested. Using SUM was a bad idea in this case.

T_dec = torch.tensor([0, 1])
X = torch.tensor([[0.5, 0.8, 2.1], [0.7, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.7, 0.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
print("Z:\n", Z)
L = torch.nn.CrossEntropyLoss()(Z,T_dec)
Z.register_hook(lambda gZ: print("gZ:\n",gZ))
L.backward()
print("gX:\n", X.grad)

Result:

Z:
 tensor([[1.7500, 2.6600],
        [3.0700, 1.3100]], grad_fn=<ThAddmmBackward>)
gZ:
 tensor([[-0.3565,  0.3565],
        [ 0.4266, -0.4266]])
gX:
 tensor([[-0.7843,  0.6774,  0.3209],
        [ 0.9385, -0.8105, -0.3839]])

0人赞添加讨论(0) 举报

迷人小祖宗

3楼-- · 2019-08-01 06:20

First of all you only calculate gradients for tensors where you enable the gradient by setting the requires_grad to True.

So your output is just as one would expect. You get the gradient for X.

PyTorch does not save gradients of intermediate results for performance reasons. So you will just get the gradient for those tensors you set requires_grad to True.

However you can use register_hook to extract the intermediate grad during calculation or to save it manually. Here I just save it to the grad variable of tensor Z:

import torch

# function to extract grad
def set_grad(var):
    def hook(grad):
        var.grad = grad
    return hook

X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)

# register_hook for Z
Z.register_hook(set_grad(Z))

S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

This will output:

Z:
 tensor([[2.1500, 2.9100],
        [1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
 tensor([[1., 1.],
        [1., 1.]])
gX:
 tensor([[ 3.6000, -0.9000,  1.3000],
        [ 3.6000, -0.9000,  1.3000]])

Hope this helps!

Btw.: Normally you would want the gradient to be activated for your parameters - so your weights and biases. Because what you would do right now when using the optimizer, is altering your inputs X and not your weights W and bias B. So usually gradient is activated for W and B in such a case.

0人赞添加讨论(0) 举报

Pytorch - Getting gradient for intermediate variab

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间