How to combine the outputs of multiple naive bayes

I am new to this.

I have a set of weak classifiers constructed using Naive Bayes Classifier (NBC) in Sklearn toolkit.

My problem is how do I combine the output of each of the NBC to make final decision. I want my decision to be in probabilities and not labels.

I made a the following program in python. I assume 2 class problem from iris-dataset in sklean. For demo/learning say I make a 4 NBC as follows.

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

import numpy as np
import cPickle
import math

iris = datasets.load_iris()

gnb1 = GaussianNB()
gnb2 = GaussianNB()
gnb3 = GaussianNB()
gnb4 = GaussianNB()

#Actual dataset is of 3 class I just made it into 2 class for this demo
target = np.where(iris.target, 2, 1)

gnb1.fit(iris.data[:, 0].reshape(150,1), target)
gnb2.fit(iris.data[:, 1].reshape(150,1), target)
gnb3.fit(iris.data[:, 2].reshape(150,1), target)
gnb4.fit(iris.data[:, 3].reshape(150,1), target)

#y_pred = gnb.predict(iris.data)
index = 0
y_prob1 = gnb1.predict_proba(iris.data[index,0].reshape(1,1))
y_prob2 = gnb2.predict_proba(iris.data[index,1].reshape(1,1))
y_prob3 = gnb3.predict_proba(iris.data[index,2].reshape(1,1))
y_prob4 = gnb4.predict_proba(iris.data[index,3].reshape(1,1))

#print y_prob1, "\n", y_prob2, "\n", y_prob3, "\n", y_prob4 

 # I just added it over all for each class
pos = y_prob1[:,1] + y_prob2[:,1] + y_prob3[:,1] + y_prob4[:,1]
neg = y_prob1[:,0] + y_prob2[:,0] + y_prob3[:,0] + y_prob4[:,0]

print pos
print neg

As you will notice I just simply added the probabilites of each of NBC as final score. I wonder if this correct?

If I have dont it wrong can you please suggest some ideas so I can correct myself.

标签： python machine-learning scikit-learn artificial-intelligence bayesian

1条回答

看我几分像从前

2楼-- · 2019-04-11 08:31

First of all - why you do this? You should have one Naive Bayes here, not one per feature. It looks like you do not understand the idea of the classifier. What you did is actually what Naive Bayes is doing internally - it treats each feature independently, but as these are probabilities you should multiply them, or add logarithms, so:

You should just have one NB, gnb.fit(iris.data, target)
If you insist on having many NBs, you should merge them through multiplication or addition of logarithms (which is the same from mathematical perspective, but multiplication is less stable in the numerical sense)

pos = y_prob1[:,1] * y_prob2[:,1] * y_prob3[:,1] * y_prob4[:,1]

or

pos = np.exp(np.log(y_prob1[:,1]) + np.log(y_prob2[:,1]) + np.log(y_prob3[:,1]) + np.log(y_prob4[:,1]))

you can also directly predit logarithm through gnb.predict_log_proba instead of gbn.predict_proba.

However, this approach have one error - Naive Bayes will also include prior in each of your prob's, so you will have very skewed distributions. So you have to manually normalize

pos_prior = gnb1.class_prior_[1] # all models have the same prior so we can use the one from gnb1

pos = pos_prior_ * (y_prob1[:,1]/pos_prior_) * (y_prob2[:,1]/pos_prior_) * (y_prob3[:,1]/pos_prior_) * (y_prob4[:,1]/pos_prior_)

which simplifies to

pos = y_prob1[:,1] * y_prob2[:,1] * y_prob3[:,1] * y_prob4[:,1] / pos_prior_**3

and for log to

pos = ... - 3 * np.log(pos_prior_)

So once again - you should use the "1" option.

0人赞添加讨论(0) 举报

How to combine the outputs of multiple naive bayes

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间