LSTM/GRU autoencoder convergency

2020-03-31 08:06发布

问题:

Goal

I have a strange situation trying to create an efficient autoencoder over my time series dataset:
X_train (200, 23, 178) X_val (100, 23, 178) X_test (100, 23, 178)

Current situation

With a simple autoencoder I have better results rather than my simple LSTM AE over a dataset of time series.
I have some concerns about my utilization of the Repeat Vector wrapper layer, which as far as I understood, is supposed to repeat a number of times like the sequence length the last state of the LSTM/GRU cell, in order to fit the input shape of the decoder layer.

The model does not arise any error, but still results are an order of magnitude worst than a simple AE, while I am expecting to be at least the same as I am using an architecture which should properly fit the domain problem. Nevertheless, the reconstruction does not look good at all, just noise.

My AE model:

Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 178)               31862     
_________________________________________________________________
batch_normalization (BatchNo (None, 178)               712       
_________________________________________________________________
dense_1 (Dense)              (None, 59)                10561     
_________________________________________________________________
dense_2 (Dense)              (None, 178)               10680     
=================================================================
  • optimizer: sgd
  • loss: mse
  • activation function of the dense layers: relu

My LSTM/GRU AE:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 23, 178)           0         
_________________________________________________________________
gru (GRU)                    (None, 59)                42126     
_________________________________________________________________
repeat_vector (RepeatVector) (None, 23, 59)            0         
_________________________________________________________________
gru_1 (GRU)                  (None, 23, 178)           127092    
_________________________________________________________________
time_distributed (TimeDistri (None, 23, 178)           31862     
=================================================================
  • optimizer: sgd
  • loss: mse
  • activation function of the gru layers: relu

Am I doing some huge error over certain assumptions while using those recurrent layers? Or would you have some suggests on how to debug this?

回答1:

The 2 models you have above do not seem to be comparable, in a meaningful way. The first model is attempting to compress your vector of 178 values. It is quite possible that these vectors contain some redundant information so it is reasonable to assume that you will be able to compress them.

The second model is attempting to compress a sequence of 23 x 178 vectors via single GRU layer. This is a task with a significantly higher number of parameters. The repeat vector simply takes the output of the 1st GRU layer (the encoder) and makes it in input of the 2nd GRU layer (the decoder). But then you take a single value of the decoder. Instead of the TimeDistributed layer, I'd recommend that you use return_sequences=True in the 2nd GRU (decoder). Otherwise you are saying that you are expecting that the 23x178 sequence is constituted with elements all with the same value; that has to lead to a very high error / no solution.

I'd recommend you take a step back. Is your goal to find similarity between the sequences ? Or to be able to make predictions ? An auto-encoder approach is preferable for a similarity task. In order to make predictions, I'd recommend that you go more towards an approach where you apply a Dense(1) layer to the output of the sequences step.

Is your data-set open ? available ? I'd be curious on taking it for a spin if that would be possible.