可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

(pytorch beginner here)

I would like to add the L1 regularizer to the activations output from a ReLU. More generally, how does one add a regularizer only to a particular layer in the network?

This post may be related: Adding L1/L2 regularization in PyTorch? However either it is not related, or else I do not understand the answer:

It refer to a L2 regularizer applied in the optimization, which is a different thing. In other words, if the overall desired loss is

crossentropy + lambda1*L1(layer1) + lambda2*L1(layer2) + ...

I believe the parameter supplied to torch.optim.Adagrad is applied only to the cross-entropy loss. Or perhaps it is applied to all parameters (weights) across the network. But in any case it is does not seem to allow applying a regularizer to a single layer of activations, and does not provide L1 loss.

Another relevant topic is nn.modules.loss, which includes L1Loss(). From the documentation, I do not yet understand how to use this.

Lastly, there is this module https://github.com/pytorch/pytorch/blob/master/torch/legacy/nn/L1Penalty.py which seems closest to the goal, but it is called "legacy". Why is that?

回答1:

Here is how you do this:

In your Module's forward return final output and layers' output for which you want to apply L1 regularization
loss variable will be sum of cross entropy loss of output w.r.t. targets and L1 penalties.

Here's an example code

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)

    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out, layer1_out, layer2_out

batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

# usually following code is looped over all batches 
# but let's just do a dummy batch for brevity

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())

optimizer.zero_grad()
outputs, layer1_out, layer2_out = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)

all_linear1_params = torch.cat([x.view(-1) for x in model.linear1.parameters()])
all_linear2_params = torch.cat([x.view(-1) for x in model.linear2.parameters()])
l1_regularization = lambda1 * torch.norm(all_linear1_params, 1)
l2_regularization = lambda2 * torch.norm(all_linear2_params, 2)

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()

回答2:

@Sasank Chilamkurthy Regularization should be the weighting parameter of each layer of the model, not the output of each layer. please look below: Regularization

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)
    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out

batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())
l1_regularization, l2_regularization = torch.tensor(0), torch.tensor(0)

optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
for param in model.parameters():
    l1_regularization += torch.norm(param, 1)
    l2_regularization += torch.norm(param, 2)

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()