Why should be the function backward be called only

I am new to pytorch. I want to understand as to why we can't call the backward function on a variable containing a tensor of say size say [2,2]. And if we do want to call it on a variable containing tensor of say size say [2,2], we have to do that by first defining a gradient tensor and then calling the backward function on the variable containing the tensor w.r.t the defined gradients.

from the tutorial on autograd

If you want to compute the derivatives, you can call .backward() on a Variable. If Variable is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output argument that is a tensor of matching shape.

Basically to start the chain rule you need a gradient AT THE OUTPUT, to get it going. In the event the output is a scalar loss function ( which it usually is - normally you are beginning the backward pass at the loss variable ) , its an implied value of 1.0

from tutorial :

let's backprop now out.backward() is equivalent to doing out.backward(torch.Tensor([1.0]))

but maybe you only want to update a subgraph ( somewhere deep in the network) ... and the value of a Variable is a matrix of weights. Then you have to tell it where the begin. From one of their chief devs ( somewhere in the links )

Yes, that's correct. We only support differentiation of scalar functions, so if you want to start backward form a non-scalar value you need to provide dout / dy

The gradients argument

https://discuss.pytorch.org/t/how-the-backward-works-for-torch-variable/907/8 ok explanation

Pytorch, what are the gradient arguments good explanation

http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html tutorial