Pybrain Text Classification: data and input

2019-06-03 23:14发布

站内文章 / Python

75 0

聊天终结者

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have 3 sets of sentences (varying in word counts), but I don't know how to extract features from the text such that the input dimension will remain the same.

For example, I've tried bag-of-words but, since the word-count variation causes input-dimension variation, I eventually get errors.

I would much appreciate it if you could show me an approach to preparing the string data for the neural network.

Thank you!

(Python 2.7 in Windows 7)

回答1:

How to format the input

This is an extraction from wikipedia.org

Here are two simple text documents:

John likes to watch movies. Mary likes too.
John also likes to watch football games.

Based on these two text documents, a dictionary is constructed as:

{
    "John": 1,
    "likes": 2,
    "to": 3,
    "watch": 4,
    "movies": 5,
    "also": 6,
    "football": 7,
    "games": 8,
    "Mary": 9,
    "too": 10
}

which has 10 distinct words. And using the indexes of the dictionary, each document is represented by a 10-entry vector:

[1, 2, 1, 1, 1, 0, 0, 0, 1, 1]
[1, 1, 1, 1, 0, 1, 1, 1, 0, 0]

Your input will remain the same size, regardless of the length of your document. I hope this will help you.

标签： python machine-learning neural-network feature-extraction pybrain

聊天终结者

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

Pybrain Text Classification: data and input

问题:

回答1:

How to format the input

Here are two simple text documents:

Based on these two text documents, a dictionary is constructed as:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮