Goal
I have a strange situation trying to create an efficient autoencoder over my time series dataset:
X_train (200, 23, 178)
X_val (100, 23, 178)
X_test (100, 23, 178)
Current situation
With a simple autoencoder I have better results rather than my simple LSTM AE over a dataset of time series.
I have some concerns about my utilization of the Repeat Vector wrapper layer, which as far as I understood, is supposed to repeat a number of times like the sequence length the last state of the LSTM/GRU cell, in order to fit the input shape of the decoder layer.
The model does not arise any error, but still results are an order of magnitude worst than a simple AE, while I am expecting to be at least the same as I am using an architecture which should properly fit the domain problem. Nevertheless, the reconstruction does not look good at all, just noise.
My AE model:
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 178) 31862
_________________________________________________________________
batch_normalization (BatchNo (None, 178) 712
_________________________________________________________________
dense_1 (Dense) (None, 59) 10561
_________________________________________________________________
dense_2 (Dense) (None, 178) 10680
=================================================================
- optimizer: sgd
- loss: mse
- activation function of the dense layers: relu
My LSTM/GRU AE:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 23, 178) 0
_________________________________________________________________
gru (GRU) (None, 59) 42126
_________________________________________________________________
repeat_vector (RepeatVector) (None, 23, 59) 0
_________________________________________________________________
gru_1 (GRU) (None, 23, 178) 127092
_________________________________________________________________
time_distributed (TimeDistri (None, 23, 178) 31862
=================================================================
- optimizer: sgd
- loss: mse
- activation function of the gru layers: relu
Am I doing some huge error over certain assumptions while using those recurrent layers? Or would you have some suggests on how to debug this?