Why is it in Pytorch when I make a COPY of a netwo

I wrote the following code as a test because in my original network I use a ModuleDict and depends on what index I feed it would slice and train only parts of that network.

I wanted to make sure that only the sliced layers would update their weight so I wrote some test code to double check. Well I am getting some weird results. Say if my model has 2 layers, layer1 is an FC and layer 2 is Conv2d, if I slice the network and ONLY use layer2 I would expect layer1's weight to be unchanged because they are unused and layer2's weight will get updated after 1 epoch.

So my plan was to used a for loop to grab all the weights from the network Before training then I would do it AFTER 1 optimizer.step(). Both times I would store those weights completely separate in 2 Python lists so I can compare their results later. Well for some reason the two lists are completely the same if I compare them with torch.equal() I thought its because maybe there is still some sort of hidden link in memory? So I tried to use .detach() on the weights when I grab them from the loop and the result is still the same. Layer2's weight should be different in this case because it should contain weights from the network before training.

Noted in the code below I am actually using layer1 and ignoring layer2.

Full code:

class mymodel(nn.Module):
    def __init__(self):
        super().__init__() 
        self.layer1 = nn.Linear(10, 5)
        self.layer2 = nn.Conv2d(1, 5, 4, 2, 1)
        self.act = nn.Sigmoid()
    def forward(self, x):
        x = self.layer1(x) #only layer1 and act are used layer 2 is ignored so only layer1 and act's weight should be updated
        x = self.act(x)
        return x
model = mymodel()

weights = []

for param in model.parameters(): # loop the weights in the model before updating and store them
    print(param.size())
    weights.append(param)

critertion = nn.BCELoss() #criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr = 0.001)

foo = torch.randn(3, 10) #fake input
target = torch.randn(3, 5) #fake target

result = model(foo) #predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

# **prints all Trues when "layer1" and "act" should be different, I have also tried to call param.detach in the loop but I got the same result.

You have to clone the parameters, otherwise you just copy the reference.

weights = []

for param in model.parameters():
    weights.append(param.clone())

criterion = nn.BCELoss() # criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr=0.001)

foo = torch.randn(3, 10) # fake input
target = torch.randn(3, 5) # fake target

result = model(foo) # predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param.clone()) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

which gives