Keras - Text Classification - LSTM - How to input

Im trying to understand how to use LSTM to classify a certain dataset that i have.

I researched and found this example of keras and imdb : https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

However, im confused about how the data set must be processed to input.

I know keras has pre-processing text methods, but im not sure which to use.

The x contain n lines with texts and the y classify the text by happiness/sadness. Basically, 1.0 means 100% happy and 0.0 means totally sad. the numbers may vary, for example 0.25~~ and so on.

So my question is, How i input x and y properly? Do i have to use bag of words? Any tip is appreciated!

I coded this below but i keep getting the same error:

#('Bad input argument to theano function with name ... at index 1(0-based)', 
'could not convert string to float: negative')

import keras.preprocessing.text
import numpy as np

np.random.seed(1337)  # for reproducibility

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM

print('Loading data...')
import pandas

thedata = pandas.read_csv("dataset/text.csv", sep=', ', delimiter=',', header='infer', names=None)

x = thedata['text']
y = thedata['sentiment']

x = x.iloc[:].values
y = y.iloc[:].values

###################################
tk = keras.preprocessing.text.Tokenizer(nb_words=2000, filters=keras.preprocessing.text.base_filter(), lower=True, split=" ")
tk.fit_on_texts(x)

x = tk.texts_to_sequences(x)


###################################
max_len = 80
print "max_len ", max_len
print('Pad sequences (samples x time)')

x = sequence.pad_sequences(x, maxlen=max_len)

#########################
max_features = 20000
model = Sequential()
print('Build model...')

model = Sequential()
model.add(Embedding(max_features, 128, input_length=max_len, dropout=0.2))
model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='rmsprop')

model.fit(x, y=y, batch_size=200, nb_epoch=1, verbose=1, validation_split=0.2, show_accuracy=True, shuffle=True)

# at index 1(0-based)', 'could not convert string to float: negative')

标签： theano keras lstm lasagne

1条回答

看我几分像从前

2楼-- · 2019-06-21 23:42

Review how you are using your CSV parser to read the text in. Ensure that the fields are in the format Text, Sentiment if you want to to make use of the parser as you've written it in your code.

0人赞添加讨论(0) 举报

Keras - Text Classification - LSTM - How to input

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间