Shape of pytorch model.parameter is inconsistent w

I'm attempting to extract the weights and biases from a simple network built in PyTorch. My entire network is composed of nn.Linear layers. When I create a layer by calling nn.Linear(in_dim, out_dim), I expect the parameters that I get from calling model.parameters() for that model to be of shape (in_dim, out_dim) for the weight and (out_dim) for the bias. However, the weights that come out of model.parameters() are instead of shape (out_dim, in_dim).

The intention of my code is to be able to use matrix multiplication to perform a forward pass using only numpy, not any PyTorch. Because of the shape inconsistency, matrix multiplications throw an error. How can I fix this?

Here is my exact code:

class RNN(nn.Module):

    def __init__(self, dim_input, dim_recurrent, dim_output):

        super(RNN, self).__init__()

        self.dim_input = dim_input
        self.dim_recurrent = dim_recurrent
        self.dim_output = dim_output

        self.dense1 = nn.Linear(self.dim_input, self.dim_recurrent)
        self.dense2 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)
        self.dense3 = nn.Linear(self.dim_input, self.dim_recurrent)
        self.dense4 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)
        self.dense5 = nn.Linear(self.dim_recurrent, self.dim_output)

#There is a defined forward pass

model = RNN(12, 100, 6)

for i in model.parameters():
    print(i.shape())

The output is:

torch.Size([100, 12])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 12])
torch.Size([100])
torch.Size([100, 100])
torch.Size([6, 100])
torch.Size([6])

The output should, if I'm correct, be:

torch.Size([12, 100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([12, 100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 6])
torch.Size([6])

What is my issue?

What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. When you call print(model) you can see that input and output features are correct:

RNN(
  (dense1): Linear(in_features=12, out_features=100, bias=True)
  (dense2): Linear(in_features=100, out_features=100, bias=False)
  (dense3): Linear(in_features=12, out_features=100, bias=True)
  (dense4): Linear(in_features=100, out_features=100, bias=False)
  (dense5): Linear(in_features=100, out_features=6, bias=True)
)

You can check the source code to see that the weights are actually transposed before calling matmul.

nn.Linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear

You can check the forward, it looks like this:

def forward(self, input):
    return F.linear(input, self.weight, self.bias)

F.linear is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html

The respective line for multiplying the weights is:

output = input.matmul(weight.t())

As mentioned above you can see that the weights are transposed before applying matmul and therefore the shape of the weights is different than you expected.

So if you want to do the matrix multiplication manually, you do:

# dummy input of length 5
input = torch.rand(5, 12)
# apply layer dense1 (without bias, for bias just add + model.dense1.bias)
output_first_layer = input.matmul(model.dense1.weight.t())
print(output_first_layer.shape)

Just as you would expect from your dense1 it returns: