How to do text classification with label probabili

I'm trying to solve a text classification problem for academic purpose. I need to classify the tweets into labels like "cloud" ,"cold", "dry", "hot", "humid", "hurricane", "ice", "rain", "snow", "storms", "wind" and "other". Each tweet in training data has probabilities against all the label. Say the message "Can already tell it's going to be a tough scoring day. It's as windy right now as it was yesterday afternoon." has 21% chance for being hot and 79% chance for wind. I have worked on the classification problems which predicts whether its wind or hot or others. But in this problem, each training data has probabilities against all the labels. I have previously used mahout naive bayes classifier which take a specific label for a given text to build model. How to convert these input probabilities for various labels as input to any classifier?

标签： machine-learning classification mahout text-mining

2条回答

孤傲高冷的网名

2楼-- · 2019-08-30 07:01

In a probabilistic setting, these probabilities reflect uncertainty about the class label of your training instance. This affects parameter learning in your classifier.

There's a natural way to incorporate this: in Naive Bayes, for instance, when estimating parameters in your models, instead of each word getting a count of one for the class to which the document belongs, it gets a count of probability. Thus documents with high probability of belonging to a class contribute more to that class's parameters. The situation is exactly equivalent to when learning a mixture of multinomials model using EM, where the probabilities you have are identical to the membership/indicator variables for your instances.

Alternatively, if your classifier were a neural net with softmax output, instead of the target output being a vector with a single [1] and lots of zeros, the target output becomes the probability vector you're supplied with.

I don't, unfortunately, know of any standard implementations that would allow you to incorporate these ideas.

0人赞添加讨论(0) 举报

家丑人穷心不美

3楼-- · 2019-08-30 07:04

If you want an off the shelf solution, you could use a learner the supports multiclass classification and instance weights. Let's say you have k classes with probabilities p_1, ..., p_k. For each input instance, create k new training instances with identical features, and with label 1, ..., k, and assign weights p_1, ..., p_k respectively.

Vowpal Wabbit is one such learner that supports multiclass classification with instance weights.

0人赞添加讨论(0) 举报

How to do text classification with label probabili

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间