I thought mask_zero=True
will output 0's when the input value is 0, so the following layers could skip computation or something.
How does mask_zero
works?
Example:
data_in = np.array([
[1, 2, 0, 0]
])
data_in.shape
>>> (1, 4)
# model
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
m = Model(inputs=x, outputs=e)
p = m.predict(data_in)
print(p.shape)
print(p)
The actual output is: (the numbers are random)
(1, 4, 5)
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]]]
However, I thought the output will be:
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0 0 0 0 0]
[ 0 0 0 0 0]]]
Actually, setting
mask_zero=True
for the Embedding layer does not result in returning a zero vector. Rather, the behavior of the Embedding layer would not change and it would return the embedding vector with index zero. You can confirm this by checking the Embedding layer weights (i.e. in the example you mentioned it would bem.layers[0].get_weights()
). Instead, it would affect the behavior of the following layers such as RNN layers.If you inspect the source code of Embedding layer you would see a method called
compute_mask
:This output mask will be passed, as the
mask
argument, to the following layers which support masking. This has been implemented in the__call__
method of base layer,Layer
:And this makes the following layers to ignore (i.e. does not consider in their computations) this inputs steps. Here is a minimal example:
As you can see the outputs of the LSTM layer for the second and forth timesteps are the same as the output of first and third timesteps, respectively. This means that those timesteps have been masked.
Update: The mask will also be considered when computing the loss since the loss functions are internally augmented to support masking using
weighted_masked_objective
:when compiling the model:
You can verify this using the following example: