What is the difference between an Embedding Layer

2019-01-24 06:02发布

The docs for an Embedding Layer in Keras say:

Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

I believe this could also be achieved by encoding the inputs as one-hot vectors of length vocabulary_size, and feeding them into a Dense Layer.

Is an Embedding Layer merely a convenience for this two-step process, or is something fancier going on under the hood?

标签： machine-learning neural-network deep-learning keras keras-layer

1条回答

男人必须洒脱

2楼-- · 2019-01-24 06:39

Mathematically, the difference is this:

An embedding layer performs select operation. In keras, this layer is equivalent to:
```
K.gather(self.embeddings, inputs)      # just one matrix
```

A dense layer performs dot-product operation, plus an optional activation:

outputs = matmul(inputs, self.kernel)  # a kernel matrix
outputs = bias_add(outputs, self.bias) # a bias vector
return self.activation(outputs)        # an activation function

You can emulate an embedding layer with fully-connected layer via one-hot encoding, but the whole point of dense embedding is to avoid one-hot representation. In NLP, the word vocabulary size can be of the order 100k (sometimes even a million). On top of that, it's often needed to process the sequences of words in a batch. Processing the batch of sequences of word indices would be much more efficient than the batch of sequences of one-hot vectors. In addition, gather operation itself is faster than matrix dot-product, both in forward and backward pass.

0人赞添加讨论(0) 举报

What is the difference between an Embedding Layer

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间