Gradient is none in pytorch when it shouldn't

I am trying to get/trace the gradient of a variable using pytorch, where I have that variable, pass it to a first function that looks for some minimum value of some other variable, then the output of the first function is inputted to a second function, and the whole thing repeats multiple times.

Here is my code:

import torch

def myFirstFunction(parameter_current_here):
    optimalValue = 100000000000000
    Optimal = 100000000000000
    for j in range(2, 10):
        i = torch.ones(1, requires_grad=True)*j
        with torch.enable_grad():
            optimalValueNow = i*parameter_current_here.sum()
        if (optimalValueNow < optimalValue):
            optimalValue = optimalValueNow
            Optimal = i
    return optimalValue, Optimal

def mySecondFunction(Current):
    with torch.enable_grad():
        y = (20*Current)/2 + (Current**2)/10
    return y

counter = 0
while counter < 5:
    parameter_current = torch.randn(2, 2, requires_grad=True)

    outputMyFirstFunction = myFirstFunction(parameter_current)
    outputmySecondFunction = mySecondFunction(outputMyFirstFunction[1])
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",
               outputMyFirstFunction)
    print("outputmySecondFunction after backward:",
               outputmySecondFunction)
    print("parameter_current Gradient after backward:",
               parameter_current.grad)

    counter = counter + 1

The parameter_current.grad is none for all iterations when it obviously shouldn't be none. What am I doing wrong? And how can I fix it?

Your help on this would be highly appreciated. Thanks a lot!

Aly

回答1:

I'm guessing the problem is the with torch.enable_grad(): statements. After to exited the with statement, the torch.enable_grad() no longer applies and torch will clear the grads after the functions are run.

回答2:

Since it is not really clear to me what you actually want to archive, besides computing gradients for parameter_current, I just focus on describing why it doesn't work and what you can do to acutally compute gradients.

I've added some comments in the code to make it more clear what the problem is.

But in short the problem is that your parameter_current is not part of the computation of your loss resp. the tensor you call backward() on which is outputmySecondFunction.

So currently you are only computing gradients for i as you have set requires_grad=True for it.

Please check the comments, for detailes:

import torch

def myFirstFunction(parameter_current_here):
    # I removed some stuff to reduce it to the core features
    # removed torch.enable_grad(), since it is enabled by default
    # removed Optimal=100000000000000 and Optimal=i, they are not used
    optimalValue=100000000000000
    for j in range(2,10):
        # Are you sure you want to compute gradients this tensor i? 
        # Because this is actually what requires_grad=True does.
        # Just as a side note, this isn't your problem, but affects performance of the model.
        i= torch.ones(1,requires_grad=True)*j
        optimalValueNow=i*parameter_current_here.sum()
        if (optimalValueNow<optimalValue):
            optimalValue=optimalValueNow

    # Part Problem 1:
    # optimalValueNow is multiplied with your parameter_current
    # i is just your parameter i, nothing else
    # lets jump now the output below in the loop: outputMyFirstFunction
    return optimalValueNow,i

def mySecondFunction(Current):
    y=(20*Current)/2 + (Current**2)/10
    return y

counter=0
while counter<5:
    parameter_current = torch.randn(2, 2,requires_grad=True)

    # Part Problem 2:
    # this is a tuple (optimalValueNow,i) like described above
    outputMyFirstFunction=myFirstFunction(parameter_current)
    # now you are taking i as an input
    # and i is just torch.ones(1,requires_grad=True)*j
    # it as no connection to parameter_current
    # thus nothing is optimized
    outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])

    # calculating gradients, since parameter_current is not part of the computation 
    # no gradients will be computed, you only get gradients for i
    # Btw. if you would not have set requires_grad=True for i, you actually would get an error message
    # for calling backward on this
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",outputMyFirstFunction)
    print("outputmySecondFunction after backward:",outputmySecondFunction)
    print("parameter_current Gradient after backward:",parameter_current.grad)

    counter=counter+1

So if you want to compute gradients for parameter_current you simply have to make sure it is part of the computation of the tensor you call backward() on, you can do so for example by changing:

outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])

to:

outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0])

Will have this effect, as soon as you change it you will get gradients for parameter_current!

I hope it helps!

Full working code:

import torch

def myFirstFunction(parameter_current_here):
    optimalValue=100000000000000
    for j in range(2,10):
        i= torch.ones(1,requires_grad=True)*j
        optimalValueNow=i*parameter_current_here.sum()
        if (optimalValueNow<optimalValue):
            optimalValue=optimalValueNow

    return optimalValueNow,i

def mySecondFunction(Current):
    y=(20*Current)/2 + (Current**2)/10
    return y

counter=0
while counter<5:
    parameter_current = torch.randn(2, 2,requires_grad=True)
    outputMyFirstFunction=myFirstFunction(parameter_current)
    outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0]) # changed line
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",outputMyFirstFunction)
    print("outputmySecondFunction after backward:",outputmySecondFunction)
    print("parameter_current Gradient after backward:",parameter_current.grad)

    counter=counter+1

Output:

outputMyFirstFunction after backward: (tensor([ 1.0394]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 10.5021])
parameter_current Gradient after backward: tensor([[ 91.8709,  91.8709],
        [ 91.8709,  91.8709]])
outputMyFirstFunction after backward: (tensor([ 13.1481]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 148.7688])
parameter_current Gradient after backward: tensor([[ 113.6667,  113.6667],
        [ 113.6667,  113.6667]])
outputMyFirstFunction after backward: (tensor([ 5.7205]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 60.4772])
parameter_current Gradient after backward: tensor([[ 100.2969,  100.2969],
        [ 100.2969,  100.2969]])
outputMyFirstFunction after backward: (tensor([-13.9846]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-120.2888])
parameter_current Gradient after backward: tensor([[ 64.8278,  64.8278],
        [ 64.8278,  64.8278]])
outputMyFirstFunction after backward: (tensor([-10.5533]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-94.3959])
parameter_current Gradient after backward: tensor([[ 71.0040,  71.0040],
        [ 71.0040,  71.0040]])