I'm trying to understand using kfolds cross validation from the sklearn python module.
I understand the basic flow:
- instantiate a model e.g.
model = LogisticRegression()
- fitting the model e.g.
model.fit(xtrain, ytrain)
- predicting e.g.
model.predict(ytest)
- use e.g. cross val score to test the fitted model accuracy.
Where i'm confused is using sklearn kfolds with cross val score. As I understand it the cross_val_score function will fit the model and predict on the kfolds giving you an accuracy score for each fold.
e.g. using code like this:
kf = KFold(n=data.shape[0], n_folds=5, shuffle=True, random_state=8)
lr = linear_model.LogisticRegression()
accuracies = cross_val_score(lr, X_train,y_train, scoring='accuracy', cv = kf)
So if I have a dataset with training and testing data, and I use the cross_val_score
function with kfolds to determine the accuracy of the algorithm on my training data for each fold, is the model
now fitted and ready for prediction on the testing data?
So in the case above using lr.predict
Thanks for any help.