In the Keras docs for Embedding
https://keras.io/layers/embeddings/, the explanation given for mask_zero
is
mask_zero: Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal |vocabulary| + 2).
Why does input_dim need to be 2 + number of words in vocabulary? Assuming 0 is masked and can't be used, shouldn't it just be 1 + number of words? What is the other extra entry for?
Because the input_dim already is +1 of the vocabulary, so you just add another +1 for the 0 and get the +2.
I believe the docs are a bit misleading there. In the normal case you are mapping your
n
input data indices[0, 1, 2, ..., n-1]
to vectors, so yourinput_dim
should be as many elements as you haveAn equivalent (but slightly confusing) way to say this, and the way the docs do, is to say
If you enable masking, value
0
is treated differently, so you increment yourn
indices by one:[0, 1, 2, ..., n-1, n]
, thus you needor alternatively
The docs become especially confusing here as they say
where I would interpret
|x|
as the cardinality of a set (equivalent tolen(x)
), but the authors seem to mean