Python NLTK Naive Bayes Classifier: What is the un

I use the Naive Bayes classifier in Python NLTK to compute the probability distribution for the following example:

import nltk

def main():
    train = [(dict(feature=1), 'class_x'), (dict(feature=0), 'class_x'),   (dict(feature=0), 'class_y'), (dict(feature=0), 'class_y')]

    test = [dict(feature=1)]

    classifier = nltk.classify.NaiveBayesClassifier.train(train)

    print("classes available: ", sorted(classifier.labels()))

    print ("input assigned to: ", classifier.classify_many(test))

    for pdist in classifier.prob_classify_many(test):
        print ("probability distribution: ")
        print ('%.4f %.4f' % (pdist.prob('class_x'), pdist.prob('class_y')))

if __name__ == '__main__':
    main()

There are two classes (class_x and class_y) in the training dataset. Two inputs are given to each of the classes. For class_x, the first input feature has a value of 1, and the second a value of 0. For class_y, both input features have a value of 0. The test dataset is made up of one input, with a value of 1.

When I run the code, the output is:

classes available:  ['class_x', 'class_y']
input assigned to:  ['class_x']
0.7500 0.2500

To get the probabilities, or likelihoods, for each class, the classifier should multiply the prior of the class (in this case, 0.5) by the probabilities of each of the features in the class. Smoothing should be considered.

I usually use a formula similar to this (or a similar variant):

P(feature|class) = prior of class * frequency of feature in class +1 / total features in class + Vocabulary size. Smoothing can vary and slightly changes the outcome.

In the example code above, how exactly does the classifier compute the probability distribution? What is the formula used?

I checked here and here, but could not get any information as to exactly how the computation is done.

Thanks in advance.

From the source code

https://github.com/nltk/nltk/blob/develop/nltk/classify/naivebayes.py#L9yo

|                       P(label) * P(features|label)
|  P(label|features) = ------------------------------
|                              P(features)

Python NLTK Naive Bayes Classifier: What is the un

问题:

回答1:

收藏的人(0)

Python NLTK Naive Bayes Classifier: What is the un

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮