How do I load a pre-trained word-embedding into a Keras Embedding
layer?
I downloaded the glove.6B.50d.txt
(glove.6B.zip file from https://nlp.stanford.edu/projects/glove/) and I'm not sure how to add it to a Keras Embedding layer. See: https://keras.io/layers/embeddings/
You will need to pass an embeddingMatrix to the
Embedding
layer as follows:Embedding(vocabLen, embDim, weights=[embeddingMatrix], trainable=isTrainable)
vocabLen
: number of tokens in your vocabularyembDim
: embedding vectors dimension (50 in your example)embeddingMatrix
: embedding matrix built from glove.6B.50d.txtisTrainable
: whether you want the embeddings to be trainable or froze the layerThe
glove.6B.50d.txt
is a list of whitespace-separated values: word token + (50) embedding values. e.g.the 0.418 0.24968 -0.41242 ...
To create a
pretrainedEmbeddingLayer
from a Glove file:There is one great blog post describing how to create embedding layer with pre-trained word vector embeddings:
https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
Code for the above article can be found here:
https://github.com/keras-team/keras/blob/master/examples/pretrained_word_embeddings.py
Another good blog for the same purpose: https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/