Getting error while adding embedding layer to lstm

I have a seq2seq model which is working fine. I want to add an embedding layer in this network which I faced with an error.

this is my architecture using pretrained word embedding which is working fine(Actually the code is almost the same code available here, but I want to include the Embedding layer in the model rather than using the pretrained embedding vectors):

LATENT_SIZE = 20

inputs = Input(shape=(SEQUENCE_LEN, EMBED_SIZE), name="input")

encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
encoded = Lambda(rev_ent)(encoded)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = Bidirectional(LSTM(EMBED_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
NUM_EPOCHS = 1

num_train_steps = len(Xtrain) // BATCH_SIZE
num_test_steps = len(Xtest) // BATCH_SIZE

checkpoint = ModelCheckpoint(filepath=os.path.join('Data/', "simple_ae_to_compare"), save_best_only=True)
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, validation_data=test_gen, validation_steps=num_test_steps, callbacks=[checkpoint])

This is the summary:

Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 45, 50)            0         
_________________________________________________________________
encoder_lstm (Bidirectional) (None, 20)                11360     
_________________________________________________________________
lambda_1 (Lambda)            (512, 20)                 0         
_________________________________________________________________
repeater (RepeatVector)      (512, 45, 20)             0         
_________________________________________________________________
decoder_lstm (Bidirectional) (512, 45, 50)             28400

when I change the code to add the embedding layer like this:

inputs = Input(shape=(SEQUENCE_LEN,), name="input")

embedding = Embedding(output_dim=EMBED_SIZE, input_dim=VOCAB_SIZE, input_length=SEQUENCE_LEN, trainable=True)(inputs)
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(embedding)

I received this error:

expected decoder_lstm to have 3 dimensions, but got array with shape (512, 45)

So my question, what is wrong with my model?

Update

So, this error is raised in the training phase. I also checked the dimension of the data being fed to the model, it is (61598, 45) which clearly do not have the number of features or here, Embed_dim.

But why this error raises in the decoder part? because in the encoder part I have included the Embedding layer, so it is totally fine. though when it reached the decoder part and it does not have the embedding layer so it can not correctly reshape it to three dimensional.

Now the question comes why this is not happening in a similar code? this is my view, correct me if I'm wrong. because Seq2Seq code usually being used for Translation, summarization. and in those codes, in the decoder part also there is input (in the translation case, there is the other language input to the decoder, so the idea of having embedding in the decoder part makes sense). Finally, here I do not have seperate input, that's why I do not need any separate embedding in the decoder part. However, I don't know how to fix the problem, I just know why this is happening:|

Update2

this is my data being fed to the model:

   sent_wids = np.zeros((len(parsed_sentences),SEQUENCE_LEN),'int32')
sample_seq_weights = np.zeros((len(parsed_sentences),SEQUENCE_LEN),'float')
for index_sentence in range(len(parsed_sentences)):
    temp_sentence = parsed_sentences[index_sentence]
    temp_words = nltk.word_tokenize(temp_sentence)
    for index_word in range(SEQUENCE_LEN):
        if index_word < sent_lens[index_sentence]:
            sent_wids[index_sentence,index_word] = lookup_word2id(temp_words[index_word])
        else:
            sent_wids[index_sentence, index_word] = lookup_word2id('PAD')

def sentence_generator(X,embeddings, batch_size, sample_weights):
    while True:
        # loop once per epoch
        num_recs = X.shape[0]
        indices = np.random.permutation(np.arange(num_recs))
        # print(embeddings.shape)
        num_batches = num_recs // batch_size
        for bid in range(num_batches):
            sids = indices[bid * batch_size : (bid + 1) * batch_size]
            temp_sents = X[sids, :]
            Xbatch = embeddings[temp_sents]
            weights = sample_weights[sids, :]
            yield Xbatch, Xbatch
LATENT_SIZE = 60

train_size = 0.95
split_index = int(math.ceil(len(sent_wids)*train_size))
Xtrain = sent_wids[0:split_index, :]
Xtest = sent_wids[split_index:, :]
train_w = sample_seq_weights[0: split_index, :]
test_w = sample_seq_weights[split_index:, :]
train_gen = sentence_generator(Xtrain, embeddings, BATCH_SIZE,train_w)
test_gen = sentence_generator(Xtest, embeddings , BATCH_SIZE,test_w)

and parsed_sentences is 61598 sentences which are padded.

Also, this is the layer I have in the model as Lambda layer, I just added here in case it has any effect ever:

def rev_entropy(x):
        def row_entropy(row):
            _, _, count = tf.unique_with_counts(row)
            count = tf.cast(count,tf.float32)
            prob = count / tf.reduce_sum(count)
            prob = tf.cast(prob,tf.float32)
            rev = -tf.reduce_sum(prob * tf.log(prob))
            return rev

        nw = tf.reduce_sum(x,axis=1)
        rev = tf.map_fn(row_entropy, x)
        rev = tf.where(tf.is_nan(rev), tf.zeros_like(rev), rev)
        rev = tf.cast(rev, tf.float32)
        max_entropy = tf.log(tf.clip_by_value(nw,2,LATENT_SIZE))
        concentration = (max_entropy/(1+rev))
        new_x = x * (tf.reshape(concentration, [BATCH_SIZE, 1]))
        return new_x

Any help is appreciated:)

I tried the following example on Google colab (TensorFlow version 1.13.1),

from tensorflow.python import keras
import numpy as np

SEQUENCE_LEN = 45
LATENT_SIZE = 20
EMBED_SIZE = 50
VOCAB_SIZE = 100

inputs = keras.layers.Input(shape=(SEQUENCE_LEN,), name="input")

embedding = keras.layers.Embedding(output_dim=EMBED_SIZE, input_dim=VOCAB_SIZE, input_length=SEQUENCE_LEN, trainable=True)(inputs)

encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(embedding)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(EMBED_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()

And then trained the model using some random data,


x = np.random.randint(0, 90, size=(10, 45))
y = np.random.normal(size=(10, 45, 50))
history = autoencoder.fit(x, y, epochs=NUM_EPOCHS)

This solution worked fine. I feel like the issue might be the way you are feeding in labels/outputs for MSE calculation.

Update

Context

In the original problem, you are attempting to reconstruct word embeddings using a seq2seq model, where embeddings are fixed and pre-trained. However you want to use a trainable embedding layer as a part of the model it becomes very difficult to model this problem. Because you don't have fixed targets (i.e. targets change every single iteration of the optimization because your embedding layer is changing). Furthermore this will lead to a very unstable optimization problem, because the targets are changing all the time.

Fixing your code

If you do the following you should be able to get the code working. Here embeddings is the pre-trained GloVe vector numpy.ndarray.

def sentence_generator(X, embeddings, batch_size):
    while True:
        # loop once per epoch
        num_recs = X.shape[0]
        embed_size = embeddings.shape[1]
        indices = np.random.permutation(np.arange(num_recs))
        # print(embeddings.shape)
        num_batches = num_recs // batch_size
        for bid in range(num_batches):
            sids = indices[bid * batch_size : (bid + 1) * batch_size]
            # Xbatch is a [batch_size, seq_length] array
            Xbatch = X[sids, :] 

            # Creating the Y targets
            Xembed = embeddings[Xbatch.reshape(-1),:]
            # Ybatch will be [batch_size, seq_length, embed_size] array
            Ybatch = Xembed.reshape(batch_size, -1, embed_size)
            yield Xbatch, Ybatch