Why should be the function backward be called only

2019-05-29 04:45发布

问题:

I am new to pytorch. I want to understand as to why we can't call the backward function on a variable containing a tensor of say size say [2,2]. And if we do want to call it on a variable containing tensor of say size say [2,2], we have to do that by first defining a gradient tensor and then calling the backward function on the variable containing the tensor w.r.t the defined gradients.

回答1:

from the tutorial on autograd

If you want to compute the derivatives, you can call .backward() on a Variable. If Variable is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output argument that is a tensor of matching shape.

Basically to start the chain rule you need a gradient AT THE OUTPUT, to get it going. In the event the output is a scalar loss function ( which it usually is - normally you are beginning the backward pass at the loss variable ) , its an implied value of 1.0

from tutorial :

let's backprop now out.backward() is equivalent to doing out.backward(torch.Tensor([1.0]))

but maybe you only want to update a subgraph ( somewhere deep in the network) ... and the value of a Variable is a matrix of weights. Then you have to tell it where the begin. From one of their chief devs ( somewhere in the links )

Yes, that's correct. We only support differentiation of scalar functions, so if you want to start backward form a non-scalar value you need to provide dout / dy

The gradients argument

https://discuss.pytorch.org/t/how-the-backward-works-for-torch-variable/907/8 ok explanation

Pytorch, what are the gradient arguments good explanation

http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html tutorial