OpenAI's REINFORCE and actor-critic example for reinforcement learning has the following code:
REINFORCE:
policy_loss = torch.cat(policy_loss).sum()
actor-critic:
loss = torch.stack(policy_losses).sum() + torch.stack(value_losses).sum()
One is using torch.cat
, the other uses torch.stack
.
As far as my understanding goes, the doc doesn't give any clear distinction between them.
I would be happy to know the differences between the functions.