This question is in continue to a previous question I've asked.
I've trained an LSTM model to predict a binary class (1 or 0) for batches of 100 samples with 3 features each, i.e: the shape of the data is (m, 100, 3), where m is the number of batches.
Data:
[
[[1,2,3],[1,2,3]... 100 sampels],
[[1,2,3],[1,2,3]... 100 sampels],
... avaialble batches in the training data
]
Target:
[
[1]
[0]
...
]
Model code:
def build_model(num_samples, num_features, is_training):
model = Sequential()
opt = optimizers.Adam(lr=0.0005, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)
batch_size = None if is_training else 1
stateful = False if is_training else True
first_lstm = LSTM(32, batch_input_shape=(batch_size, num_samples, num_features), return_sequences=True,
activation='tanh', stateful=stateful)
model.add(first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(LSTM(16, return_sequences=True, activation='tanh', stateful=stateful))
model.add(Dropout(0.2))
model.add(LeakyReLU())
model.add(LSTM(8, return_sequences=False, activation='tanh', stateful=stateful))
model.add(LeakyReLU())
model.add(Dense(1, activation='sigmoid'))
if is_training:
model.compile(loss='binary_crossentropy', optimizer=opt,
metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])
return model
For the training stage, the model is NOT stateful. When predicting I'm using a stateful model, iterating over the data and outputting a probability for each sample:
for index, row in data.iterrows():
if index % 100 == 0:
predicting_model.reset_states()
vals = np.array([[row[['a', 'b', 'c']].values]])
prob = predicting_model.predict_on_batch(vals)
When looking at the probability at the end of a batch, it is exactly the value I get when predicting with the entire batch (not one by one). However, I've expected that the probability will always continue in the right direction when new samples arrive. What actually happens is that the probability output can spike to the wrong class on an arbitrary sample (see below).
Two samples of 100 sample batches over the time of prediction (label = 1):
and Label = 0:
Is there a way to achieve what I want (avoid extreme spikes while predicting probability), or is that a given fact?
Any explanation, advice would be appreciated.
Update
Thanks to @today advice, I've tried training the network with the hidden state output for each input time step using return_sequence=True
on the last LSTM layer.
So now the labels look like so (shape (100,100)):
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
...]
the model summary:
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100, 32) 4608
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 100, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 100, 32) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 100, 16) 3136
_________________________________________________________________
dropout_2 (Dropout) (None, 100, 16) 0
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 100, 16) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 100, 8) 800
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 100, 8) 0
_________________________________________________________________
dense_1 (Dense) (None, 100, 1) 9
=================================================================
Total params: 8,553
Trainable params: 8,553
Non-trainable params: 0
_________________________________________________________________
However, I get an exception:
ValueError: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (75, 100)
What do I need to fix?