Number of features of the model must match the inp

2019-09-22 07:51发布

i am getting this error.please give me any suggestion to resolve it.here is my code.i am taking traing data from train.csv and testing data from another file test.csv.i am new to machine learning so i could not understand what is the problem.give me any suggestion.

import quandl,math    
import numpy as np    
import pandas as pd    
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
train = pd.read_csv("train.csv", index_col=None)
test = pd.read_csv("test.csv", index_col=None)
vectorizer = CountVectorizer(min_df=1)
X1 = vectorizer.fit_transform(train['question'])
Y1 = vectorizer.fit_transform(test['testing'])
X=X1.toarray()
Y=Y1.toarray()
#print(Y.shape)
number=LabelEncoder()
train['answer']=number.fit_transform(train['answer'].astype('str'))
features = ['question','answer']
y = train['answer']
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X[:25],y)
predicted_result=clf.predict(Y[17])
p_result=number.inverse_transform(predicted_result)
f = open('output.txt', 'w')
t=str(p_result)
f.write(t)
print(p_result)

标签： python machine-learning scikit-learn random-forest sklearn-pandas

1条回答

我只想做你的唯一

2楼-- · 2019-09-22 08:19

There are multiple problems with your code. But the thing related to this question is that you are fitting the CountVectorizer (vectorizer) on both train and test data, which is why you are getting different features.

What you should do is:

X1 = vectorizer.fit_transform(train['question'])

# The following line is changed
Y1 = vectorizer.transform(test['testing'])

0人赞添加讨论(0) 举报

Number of features of the model must match the inp

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间