(pytorch beginner here)
I would like to add the L1 regularizer to the activations output from a ReLU.
More generally, how does one add a regularizer only to a particular layer in the network?
This post may be related:
Adding L1/L2 regularization in PyTorch?
However either it is not related, or else I do not understand the answer:
It refer to a L2 regularizer applied in the optimization, which is a different thing.
In other words, if the overall desired loss is
crossentropy + lambda1*L1(layer1) + lambda2*L1(layer2) + ...
I believe the parameter supplied to torch.optim.Adagrad is applied only to the cross-entropy loss.
Or perhaps it is applied to all parameters (weights) across the network. But in any case
it is does not seem to allow applying a regularizer to a single layer of activations,
and does not provide L1 loss.
Another relevant topic is nn.modules.loss, which includes L1Loss().
From the documentation, I do not yet understand how to use this.
Lastly, there is this module https://github.com/pytorch/pytorch/blob/master/torch/legacy/nn/L1Penalty.py which seems closest to the goal, but it is called "legacy". Why is that?
Here is how you do this:
- In your Module's forward return final output and layers' output for which you want to apply L1 regularization
loss
variable will be sum of cross entropy loss of output w.r.t. targets and L1 penalties.
Here's an example code
import torch
from torch.autograd import Variable
from torch.nn import functional as F
class MLP(torch.nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.linear1 = torch.nn.Linear(128, 32)
self.linear2 = torch.nn.Linear(32, 16)
self.linear3 = torch.nn.Linear(16, 2)
def forward(self, x):
layer1_out = F.relu(self.linear1(x))
layer2_out = F.relu(self.linear2(layer1_out))
out = self.linear3(layer2_out)
return out, layer1_out, layer2_out
batchsize = 4
lambda1, lambda2 = 0.5, 0.01
model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
# usually following code is looped over all batches
# but let's just do a dummy batch for brevity
inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())
optimizer.zero_grad()
outputs, layer1_out, layer2_out = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
all_linear1_params = torch.cat([x.view(-1) for x in model.linear1.parameters()])
all_linear2_params = torch.cat([x.view(-1) for x in model.linear2.parameters()])
l1_regularization = lambda1 * torch.norm(all_linear1_params, 1)
l2_regularization = lambda2 * torch.norm(all_linear2_params, 2)
loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
@Sasank Chilamkurthy
Regularization should be the weighting parameter of each layer of the model, not the output of each layer. please look below:
Regularization
import torch
from torch.autograd import Variable
from torch.nn import functional as F
class MLP(torch.nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.linear1 = torch.nn.Linear(128, 32)
self.linear2 = torch.nn.Linear(32, 16)
self.linear3 = torch.nn.Linear(16, 2)
def forward(self, x):
layer1_out = F.relu(self.linear1(x))
layer2_out = F.relu(self.linear2(layer1_out))
out = self.linear3(layer2_out)
return out
batchsize = 4
lambda1, lambda2 = 0.5, 0.01
model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())
l1_regularization, l2_regularization = torch.tensor(0), torch.tensor(0)
optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
for param in model.parameters():
l1_regularization += torch.norm(param, 1)
l2_regularization += torch.norm(param, 2)
loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()