Use string as input in Keras IMDB example

2019-07-10 05:01发布

I was looking at the Keras IMDB Movie reviews sentiment classification example (and the corresponding model on github), which learns to decide whether a review is positive or negative.

The data has been preprocessed such that each review is encoded as a sequence of integers, e.g. the review "This movie is awesome!" would be [11, 17, 6, 1187] and for this input the model gives the output 'positive'.

The dataset also makes available the word index used for encoding the sequences, i.e. I know the map

This: 11
movie: 17
is: 6
awesome: 1187
...

Can I somehow include this knowledge into the model such that its input is a string, i.e. it gives a prediction based on the input "This movie is awesome!"?

标签： tensorflow machine-learning nlp keras tensorflow-serving

1条回答

Root（大扎）

2楼-- · 2019-07-10 05:39

First up, the input to the neural network is never a string, it's exactly a list of indices of words (or characters) in a vocabulary. And the first thing the model usually does is embedding transformation (see the example) which further converts these indices into the (trainable) float vectors.

What you really mean is data pre-processing step that transforms the raw input from the user (can be text, image pixels, sound recording, etc) into a format that is suitable and convenient for the model. Data pre-processing is an essential part of the machine-learning application just like the model itself, and should be stored separately. If you intend to work with imdb dataset, the vocabulary is already pre-processed. You can call imdb.get_word_index() in keras to get the word index or you can work with the vocabulary json file directly.

0人赞添加讨论(0) 举报

Use string as input in Keras IMDB example

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间