I have a CNN-RNN model architecture with Bidirectional LSTMS for time series regression problem. My loss does not converge over 50 epochs. Each epoch has 20k samples. The loss keeps bouncing between 0.001 - 0.01.
batch_size=1
epochs = 50
model.compile(loss='mean_squared_error', optimizer='adam')
trainingHistory=model.fit(trainX,trainY,epochs=epochs,batch_size=batch_size,shuffle=False)
- I tried to train the model with incorrectly paired X and Y data for which the loss stays around 0.5, is it reasonable conclusion that my X and Y have a non linear relationship which can be learned by my model over more epochs ?
- The predictions of my model capture the pattern but with an offset, I use dynamic time warping distance to manually check the accuracy of predictions, is there a better way ?
Model :
model = Sequential()
model.add(LSTM(units=128, dropout=0.05, recurrent_dropout=0.35, return_sequences=True, batch_input_shape=(batch_size,featureSteps,input_dim)))
model.add(LSTM(units=32, dropout=0.05, recurrent_dropout=0.35, return_sequences=False))
model.add(Dense(units=2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
If you tested with:
Then your model is actually cabable of learning something.
There are some possibilities there:
Case 1:
Make sure your
Y
is within the range of your last activation function.tanh
(the LSTM's default), all Y data should be between -1 and + 1sigmoid
, between 0 and 1softmax
, between 0 and 1, but make sure your last dimension is not 1, otherwise all results will be 1, always.relu
, between 0 and infinitylinear
, any valueConvergence goes better if you have a limited activation instead of one that goes to infinity.
In the first case, you can recompile (after training) the model with a lower learning rate, usually we divide it by 10, where the default is
0.0001
:Case 2:
If data is ok, try decreasing the learning rate after your model stagnates.
The default learning rate for adam is
0.0001
, we often divide it by 10:If the problem was the learning rate, this will make your model learn more than it already did (there might be some difficult at the beginning until the optimizer adjusts itself).
Case 3:
If you got no success, maybe it's time to increase the model's capability. Maybe add more units to the layers, add more layers or even change the model.
Case 4:
There's probably nothing you can do about this...
But if you increased the model like in case 3, be careful with overfitting (keep some test data to compare the test loss versus the training loss).
Too good models can simply memorize your data instead of learning important insights about it.