I try to understand what the difference between this model describde here, the following one:
from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
and the sequence to sequence model described here is second describion
What is the difference ? The first one has the RepeatVector while the second does not have that? Is the first model not taking the decoders hidden state as inital state for the prediction ?
Are there a paper describing the first and second one ?
In the model using
RepeatVector
, they're not using any kind of fancy prediction, nor dealing with states. They're letting the model do everything internally and theRepeatVector
is used to transform a(batch, latent_dim)
vector (which is not a sequence) into a(batch, timesteps, latent_dim)
(which is now a proper sequence).Now, in the other model, without
RepeatVector
, the secret lies in this additional function:This runs a "loop" based on a
stop_condition
for creating the time steps one by one. (The advantage of this is making sentences without a fixed length).It also explicitly takes the states generated in each step (in order to keep the proper connection between each individual step).
In short: