I am trying to get/trace the gradient of a variable using pytorch, where I have that variable, pass it to a first function that looks for some minimum value of some other variable, then the output of the first function is inputted to a second function, and the whole thing repeats multiple times.
Here is my code:
import torch
def myFirstFunction(parameter_current_here):
optimalValue = 100000000000000
Optimal = 100000000000000
for j in range(2, 10):
i = torch.ones(1, requires_grad=True)*j
with torch.enable_grad():
optimalValueNow = i*parameter_current_here.sum()
if (optimalValueNow < optimalValue):
optimalValue = optimalValueNow
Optimal = i
return optimalValue, Optimal
def mySecondFunction(Current):
with torch.enable_grad():
y = (20*Current)/2 + (Current**2)/10
return y
counter = 0
while counter < 5:
parameter_current = torch.randn(2, 2, requires_grad=True)
outputMyFirstFunction = myFirstFunction(parameter_current)
outputmySecondFunction = mySecondFunction(outputMyFirstFunction[1])
outputmySecondFunction.backward()
print("outputMyFirstFunction after backward:",
outputMyFirstFunction)
print("outputmySecondFunction after backward:",
outputmySecondFunction)
print("parameter_current Gradient after backward:",
parameter_current.grad)
counter = counter + 1
The parameter_current.grad is none for all iterations when it obviously shouldn't be none. What am I doing wrong? And how can I fix it?
Your help on this would be highly appreciated. Thanks a lot!
Aly
I'm guessing the problem is the with torch.enable_grad():
statements. After to exited the with
statement, the torch.enable_grad()
no longer applies and torch will clear the grads after the functions are run.
Since it is not really clear to me what you actually want to archive, besides computing gradients for parameter_current
,
I just focus on describing why it doesn't work and what you can do to acutally compute gradients.
I've added some comments in the code to make it more clear what the problem is.
But in short the problem is that your parameter_current
is not part of the computation of your loss resp. the tensor you call backward()
on which is outputmySecondFunction
.
So currently you are only computing gradients for i
as you have set requires_grad=True
for it.
Please check the comments, for detailes:
import torch
def myFirstFunction(parameter_current_here):
# I removed some stuff to reduce it to the core features
# removed torch.enable_grad(), since it is enabled by default
# removed Optimal=100000000000000 and Optimal=i, they are not used
optimalValue=100000000000000
for j in range(2,10):
# Are you sure you want to compute gradients this tensor i?
# Because this is actually what requires_grad=True does.
# Just as a side note, this isn't your problem, but affects performance of the model.
i= torch.ones(1,requires_grad=True)*j
optimalValueNow=i*parameter_current_here.sum()
if (optimalValueNow<optimalValue):
optimalValue=optimalValueNow
# Part Problem 1:
# optimalValueNow is multiplied with your parameter_current
# i is just your parameter i, nothing else
# lets jump now the output below in the loop: outputMyFirstFunction
return optimalValueNow,i
def mySecondFunction(Current):
y=(20*Current)/2 + (Current**2)/10
return y
counter=0
while counter<5:
parameter_current = torch.randn(2, 2,requires_grad=True)
# Part Problem 2:
# this is a tuple (optimalValueNow,i) like described above
outputMyFirstFunction=myFirstFunction(parameter_current)
# now you are taking i as an input
# and i is just torch.ones(1,requires_grad=True)*j
# it as no connection to parameter_current
# thus nothing is optimized
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])
# calculating gradients, since parameter_current is not part of the computation
# no gradients will be computed, you only get gradients for i
# Btw. if you would not have set requires_grad=True for i, you actually would get an error message
# for calling backward on this
outputmySecondFunction.backward()
print("outputMyFirstFunction after backward:",outputMyFirstFunction)
print("outputmySecondFunction after backward:",outputmySecondFunction)
print("parameter_current Gradient after backward:",parameter_current.grad)
counter=counter+1
So if you want to compute gradients for parameter_current
you simply have to make sure it is part of the computation
of the tensor you call backward()
on, you can do so for example by changing:
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])
to:
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0])
Will have this effect, as soon as you change it you will get gradients for parameter_current
!
I hope it helps!
Full working code:
import torch
def myFirstFunction(parameter_current_here):
optimalValue=100000000000000
for j in range(2,10):
i= torch.ones(1,requires_grad=True)*j
optimalValueNow=i*parameter_current_here.sum()
if (optimalValueNow<optimalValue):
optimalValue=optimalValueNow
return optimalValueNow,i
def mySecondFunction(Current):
y=(20*Current)/2 + (Current**2)/10
return y
counter=0
while counter<5:
parameter_current = torch.randn(2, 2,requires_grad=True)
outputMyFirstFunction=myFirstFunction(parameter_current)
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0]) # changed line
outputmySecondFunction.backward()
print("outputMyFirstFunction after backward:",outputMyFirstFunction)
print("outputmySecondFunction after backward:",outputmySecondFunction)
print("parameter_current Gradient after backward:",parameter_current.grad)
counter=counter+1
Output:
outputMyFirstFunction after backward: (tensor([ 1.0394]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 10.5021])
parameter_current Gradient after backward: tensor([[ 91.8709, 91.8709],
[ 91.8709, 91.8709]])
outputMyFirstFunction after backward: (tensor([ 13.1481]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 148.7688])
parameter_current Gradient after backward: tensor([[ 113.6667, 113.6667],
[ 113.6667, 113.6667]])
outputMyFirstFunction after backward: (tensor([ 5.7205]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 60.4772])
parameter_current Gradient after backward: tensor([[ 100.2969, 100.2969],
[ 100.2969, 100.2969]])
outputMyFirstFunction after backward: (tensor([-13.9846]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-120.2888])
parameter_current Gradient after backward: tensor([[ 64.8278, 64.8278],
[ 64.8278, 64.8278]])
outputMyFirstFunction after backward: (tensor([-10.5533]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-94.3959])
parameter_current Gradient after backward: tensor([[ 71.0040, 71.0040],
[ 71.0040, 71.0040]])