Feature selection for Named entity using SVM

I have some user comments data from which I want to find the name of consumer electronic brands. For instance consider these ne_chinked example sentence which talk about "PS4", "nokia 720 lumia" ,"apple ipad", "sony bravia":-

In [52]: nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize('When is the PS4 releasing')))
Out[52]: Tree('S', [('When', 'WRB'), ('is', 'VBZ'), ('the', 'DT'), Tree('ORGANIZATION', [('PS4', 'NNP')]), ('releasing', 'NN')])

In [53]: nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize('I couldnt find the nokia 720 lumia in stores')))
Out[53]: Tree('S', [('I', 'PRP'), ('couldnt', 'VBP'), ('find', 'JJ'), ('the', 'DT'), ('nokia', 'NN'), ('720', 'CD'), ('lumia', 'NN'), ('in', 'IN'), ('stores', 'NNS')])

In [54]: nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize('I just bought apple ipad and its really awesome')))
Out[54]: Tree('S', [('I', 'PRP'), ('just', 'RB'), ('bought', 'VBD'), ('apple', 'JJ'), ('ipad', 'NN'), ('and', 'CC'), ('its', 'PRP$'), ('really', 'RB'), ('awesome', 'JJ')])

In [55]: nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize('I would like to buy 1 Sony bravia led television')))
Out[55]: Tree('S', [('I', 'PRP'), ('would', 'MD'), ('like', 'VB'), ('to', 'TO'), ('buy', 'VB'), ('1', 'CD'), ('Sony', 'NNP'), ('bravia', 'IN'), ('led', 'VBN'), ('television', 'NN')])

The problem is how do I represent the data for the svm to learn. I read tens of research papers but none of them have disclosed how they represented the feature data to the svm. Can anybody please help

标签： python nltk svm named-entity-recognition

1条回答

别忘想泡老子

2楼-- · 2020-03-26 02:44

What I would do is add all entries of electronic brands you care in a list, and then in order for each entry to be unique I would use its entry's index in the list as a feature.

e.g. ['Nokia', 'Apple', 'Microsoft']

then: Nokia => 1 Microsoft => 2 etc

This could help having a unique representation per brand as as a result a feature for SVM amongst others I pressume.

0人赞添加讨论(0) 举报

Feature selection for Named entity using SVM

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间