I am confused about the method view()
in the following code snippet.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2,2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16*5*5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
My confusion is regarding the following line.
x = x.view(-1, 16*5*5)
What does tensor.view()
function do? I have seen its usage in many places, but I can't understand how it interprets its parameters.
What happens if I give negative values as parameters to the view()
function? For example, what happens if I call, tensor_variable.view(1, 1, -1)
?
Can anyone explain the main principle of view()
function with some examples?
Let's do some examples, from simpler to more difficult.
The
view
method returns a tensor with the same data as theself
tensor (which means that the returned tensor has the same number of elements), but with a different shape. For example:Assuming that
-1
is not one of the parameters, when you multiply them together, the result must be equal to the number of elements in the tensor. If you do:a.view(3, 3)
, it will raise aRuntimeError
because shape (3 x 3) is invalid for input with 16 elements. In other words: 3 x 3 does not equal 16 but 9.You can use
-1
as one of the parameters that you pass to the function, but only once. All that happens is that the method will do the math for you on how to fill that dimension. For examplea.view(2, -1, 4)
is equivalent toa.view(2, 2, 4)
. [16 / (2 x 4) = 2]Notice that the returned tensor shares the same data. If you make a change in the "view" you are changing the original tensor's data:
Now, for a more complex use case. The documentation says that each new view dimension must either be a subspace of an original dimension, or only span d, d + 1, ..., d + k that satisfy the following contiguity-like condition that for all i = 0, ..., k - 1, stride[i] = stride[i + 1] x size[i + 1]. Otherwise,
contiguous()
needs to be called before the tensor can be viewed. For example:Notice that for
a_t
, stride[0] != stride[1] x size[1] since 24 != 2 x 3The view function is meant to reshape the tensor.
Say you have a tensor
a
is a tensor that has 16 elements from 1 to 16(included). If you want to reshape this tensor to make it a4 x 4
tensor then you can useNow
a
will be a4 x 4
tensor. Note that after the reshape the total number of elements need to remain the same. Reshaping the tensora
to a3 x 5
tensor would not be appropriate.What is the meaning of parameter -1?
If there is any situation that you don't know how many rows you want but are sure of the number of columns, then you can specify this with a -1. (Note that you can extend this to tensors with more dimensions. Only one of the axis value can be -1). This is a way of telling the library: "give me a tensor that has these many columns and you compute the appropriate number of rows that is necessary to make this happen".
This can be seen in the neural network code that you have given above. After the line
x = self.pool(F.relu(self.conv2(x)))
in the forward function, you will have a 16 depth feature map. You have to flatten this to give it to the fully connected layer. So you tell pytorch to reshape the tensor you obtained to have specific number of columns and tell it to decide the number of rows by itself.Drawing a similarity between numpy and pytorch,
view
is similar to numpy's reshape function.