Pybrain Text Classification: data and input

2019-06-03 23:28发布

I have 3 sets of sentences (varying in word counts), but I don't know how to extract features from the text such that the input dimension will remain the same.

For example, I've tried bag-of-words but, since the word-count variation causes input-dimension variation, I eventually get errors.

I would much appreciate it if you could show me an approach to preparing the string data for the neural network.

Thank you!

(Python 2.7 in Windows 7)

标签： python machine-learning neural-network feature-extraction pybrain

1条回答

爱情/是我丢掉的垃圾

2楼-- · 2019-06-03 23:42

How to format the input

This is an extraction from wikipedia.org

Here are two simple text documents:

John likes to watch movies. Mary likes too.
John also likes to watch football games.

Based on these two text documents, a dictionary is constructed as:

{
    "John": 1,
    "likes": 2,
    "to": 3,
    "watch": 4,
    "movies": 5,
    "also": 6,
    "football": 7,
    "games": 8,
    "Mary": 9,
    "too": 10
}

which has 10 distinct words. And using the indexes of the dictionary, each document is represented by a 10-entry vector:

[1, 2, 1, 1, 1, 0, 0, 0, 1, 1]
[1, 1, 1, 1, 0, 1, 1, 1, 0, 0]

Your input will remain the same size, regardless of the length of your document. I hope this will help you.

0人赞添加讨论(0) 举报

Pybrain Text Classification: data and input

How to format the input

Here are two simple text documents:

Based on these two text documents, a dictionary is constructed as:

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间