I am new to pytorch. I want to understand as to why we can't call the backward function on a variable containing a tensor of say size say [2,2]. And if we do want to call it on a variable containing tensor of say size say [2,2], we have to do that by first defining a gradient tensor and then calling the backward function on the variable containing the tensor w.r.t the defined gradients.
问题:
回答1:
from the tutorial on autograd
If you want to compute the derivatives, you can call .backward() on a Variable. If Variable is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output argument that is a tensor of matching shape.
Basically to start the chain rule you need a gradient AT THE OUTPUT, to get it going. In the event the output is a scalar loss function ( which it usually is - normally you are beginning the backward pass at the loss variable ) , its an implied value of 1.0
from tutorial :
let's backprop now out.backward() is equivalent to doing out.backward(torch.Tensor([1.0]))
but maybe you only want to update a subgraph ( somewhere deep in the network) ... and the value of a Variable
is a matrix of weights. Then you have to tell it where the begin. From one of their chief devs ( somewhere in the links )
Yes, that's correct. We only support differentiation of scalar functions, so if you want to start backward form a non-scalar value you need to provide dout / dy
The gradients argument
https://discuss.pytorch.org/t/how-the-backward-works-for-torch-variable/907/8 ok explanation
Pytorch, what are the gradient arguments good explanation
http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html tutorial