I am trying to implement a simple MDN that predicts the parameters of a distribution over a target variable instead of a point value, and then assigns probabilities to discrete bins of the point value. Narrowing down the issue, the code from which the 'None' springs is:
import torch
# params
tte_bins = np.linspace(
start=0,
stop=399,
num=400,
dtype='float32'
).reshape(1, 1, -1)
bins = torch.tensor(tte_bins, dtype=torch.float32)
x_train = np.random.randn(1, 1024, 3)
y_labels = np.random.randint(low=0, high=399, size=(1, 1024))
y_train = np.eye(400)[y_labels]
# data
in_train = torch.tensor(x_train[0:1, :, :], dtype=torch.float)
in_train = (in_train - torch.mean(in_train)) / torch.std(in_train)
out_train = torch.tensor(y_train[0:1, :, :], dtype=torch.float)
# model
linear = torch.nn.Linear(in_features=3, out_features=2)
lin = linear(in_train)
preds = torch.exp(lin)
# intermediate values
alpha = torch.clamp(preds[0:1, :, 0:1], 0, 500)
beta = torch.clamp(preds[0:1, :, 1:2], 0, 100)
# probs
p1 = torch.exp(-torch.pow(bins / alpha, beta))
p2 = torch.exp(-torch.pow((bins + 1.0) / alpha, beta))
probs = p1 - p2
# loss
loss = torch.mean(torch.pow(out_train - probs, 2))
# gradients
loss.backward()
for p in linear.parameters():
print(p.grad, 'gradient')
in_train has shape: [1, 1024, 3], out_train has shape: [1, 1024, 400], bins has shape: [1, 1, 400]. All the broadcasting etc.. appears find, the resulting matrices (like alpha/beta/loss) are the right shape and have the right values - there's simply no gradients
edit: added loss.backward()
and x_train/y_train
, now I have nans