Scikit Learn - ValueError: operands could not be b

I'm trying to apply Gaussian Naive Bayes model on a dataset to predict disease. It's running correctly when I'm predicting using training data, but when I'm trying to predict using testing data It's giving ValueError.

runfile('D:/ROFI/ML/Heart Disease/prediction.py', wdir='D:/ROFI/ML/Heart Disease') Traceback (most recent call last):

File "", line 1, in runfile('D:/ROFI/ML/Heart Disease/prediction.py', wdir='D:/ROFI/ML/Heart Disease')

File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace)

File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/ROFI/ML/Heart Disease/prediction.py", line 85, in predict(x_train, y_train, x_test, y_test)

File "D:/ROFI/ML/Heart Disease/prediction.py", line 73, in predict predicted_data = model.predict(x_test)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\naive_bayes.py", line 65, in predict jll = self._joint_log_likelihood(X)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\naive_bayes.py", line 429, in _joint_log_likelihood n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /

ValueError: operands could not be broadcast together with shapes (294,14) (15,)

What's wrong here ?

import pandas
from sklearn import metrics
from sklearn.preprocessing import Imputer
from sklearn.naive_bayes import GaussianNB    

def load_data(feature_columns, predicted_column):

    train_data_frame = pandas.read_excel("training_data.xlsx")
    test_data_frame = pandas.read_excel("testing_data.xlsx")
    data_frame = pandas.read_excel("data_set.xlsx")

    x_train = train_data_frame[feature_columns].values
    y_train = train_data_frame[predicted_column].values

    x_test = test_data_frame[feature_columns].values
    y_test = test_data_frame[predicted_column].values

    x_train, x_test = impute(x_train, x_test)

    return x_train, y_train, x_test, y_test


def impute(x_train, x_test):

    fill_missing = Imputer(missing_values=-9, strategy="mean", axis=0)

    x_train = fill_missing.fit_transform(x_train)
    x_test = fill_missing.fit_transform(x_test)

    return x_train, x_test


def predict(x_train, y_train, x_test, y_test):

    model = GaussianNB()
    model.fit(x_train, y_train.ravel())

    predicted_data = model.predict(x_test)
    accuracy = metrics.accuracy_score(y_test, predicted_data)
    print("Accuracy of our naive bayes model is : %.2f"%(accuracy * 100))

    return predicted_data


feature_columns = ["age", "sex", "chol", "cigs", "years", "fbs", "trestbps", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num"]
predicted_column = ["cp"]

x_train, y_train, x_test, y_test = load_data(feature_columns, predicted_column)

predict(x_train, y_train, x_test, y_test)

N.B: Both file has same number of columns.

标签： python pandas numpy machine-learning scikit-learn

1条回答

家丑人穷心不美

2楼-- · 2019-08-18 08:16

I found the bug. The error is occurring because of Imputer. Imputer replaces the missing value in data set. But, if any column is entirely composed of missing value then it deletes that column. I had a column full of missing data entirely in testing data set. So, Imputer was deleting that and thus shape didn't match with training data and that's the reason of the error. Just removed the column name from feature_columns list which was full of missing value and it worked.

0人赞添加讨论(0) 举报

Scikit Learn - ValueError: operands could not be b

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间