Can someone please explain this? I know bidirectional LSTMs have a forward and backward pass but what is the advantage of this over a unidirectional LSTM?
What is each of them better suited for?
Can someone please explain this? I know bidirectional LSTMs have a forward and backward pass but what is the advantage of this over a unidirectional LSTM?
What is each of them better suited for?
In comparison to
LSTM
,BLSTM
orBiLSTM
has two networks, one accesspast
information inforward
direction and another accessfuture
in thereverse
direction. wikiA new class
Bidirectional
is added as per official doc here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectionaland activation function can be added like this:
Complete example using IMDB data will be like this.The result after 4 epoch.
BiLSTM or BLSTM
Adding to Bluesummer's answer, here is how you would implement Bidirectional LSTM from scratch without calling
BiLSTM
module. This might better contrast the difference between a uni-directional and bi-directional LSTMs. As you see, we merge two LSTMs to create a bidirectional LSTM.You can merge outputs of the forward and backward LSTMs by using either
{'sum', 'mul', 'concat', 'ave'}
.Another use case of bidirectional LSTM might be for word classification in the text. They can see the past and future context of the word and are much better suited to classify the word.
It can also be helpful in Time Series Forecasting problems, like predicting the electric consumption of a household. However, we can also use LSTM in this but Bidirectional LSTM will also do a better job in it.
LSTM in its core, preserves information from inputs that has already passed through it using the hidden state.
Unidirectional LSTM only preserves information of the past because the only inputs it has seen are from the past.
Using bidirectional will run your inputs in two ways, one from past to future and one from future to past and what differs this approach from unidirectional is that in the LSTM that runs backwards you preserve information from the future and using the two hidden states combined you are able in any point in time to preserve information from both past and future.
What they are suited for is a very complicated question but BiLSTMs show very good results as they can understand context better, I will try to explain through an example.
Lets say we try to predict the next word in a sentence, on a high level what a unidirectional LSTM will see is
And will try to predict the next word only by this context, with bidirectional LSTM you will be able to see information further down the road for example
Forward LSTM:
Backward LSTM:
You can see that using the information from the future it could be easier for the network to understand what the next word is.