Suppose that I have a model like this (this is a model for time series forecasting):
ipt = Input((data.shape[1] ,data.shape[2])) # 1
x = Conv1D(filters = 10, kernel_size = 3, padding = 'causal', activation = 'relu')(ipt) # 2
x = LSTM(15, return_sequences = False)(x) # 3
x = BatchNormalization()(x) # 4
out = Dense(1, activation = 'relu')(x) # 5
Now I want to add batch normalization layer to this network. Considering the fact that batch normalization doesn't work with LSTM, Can I add it before Conv1D
layer? I think it's rational to have a batch normalization layer after LSTM
.
Also, where can I add Dropout in this network? The same places? (after or before batch normalization?)
- What about adding
AveragePooling1D
betweenConv1D
andLSTM
? Is it possible to add batch normalization betweenConv1D
andAveragePooling1D
in this case without any effect onLSTM
layer?
Update: the LayerNormalization implementation I was using was inter-layer, not recurrent as in the original paper; results with latter may prove superior.
BatchNormalization
can work with LSTMs - the linked SO gives false advice; in fact, in my application of EEG classification, it dominatedLayerNormalization
. Now to your case:Conv1D
"? Don't - instead, standardize your data beforehand, else you're employing an inferior variant to do the same thingBatchNormalization
before an activation, and after - apply to bothConv1D
andLSTM
BN
afterLSTM
may be counterproductive per ability to introduce noise, which can confuse the classifier layer - but this is about being one layer before output, notLSTM
LSTM
withreturn_sequences=True
precedingreturn_sequences=False
, you can placeDropout
anywhere - beforeLSTM
, after, or bothrecurrent_dropout
is still preferable toDropout
forLSTM
- however, you can do both; just do not use with withactivation='relu'
, for whichLSTM
is unstable per a bugPooling
is redundant and may harm performance; scarce data is better transformed via a non-linearity than simple averaging opsSqueezeExcite
block after your Conv; it's a form of self-attention - see paper; my implementation for 1D belowactivation='selu'
withAlphaDropout
and'lecun_normal'
initialization, per paper Self Normalizing Neural NetworksBelow is an example template you can use as a starting point; I also recommend the following SO's for further reading: Regularizing RNNs, and Visualizing RNN gradients
Functions used:
Spatial Dropout: pass
noise_shape = (batch_size, 1, channels)
toDropout
- has the effect below; see Git gist for code: