I am doing a regression using Generalized Linear Model.I am caught offguard using the crossVal
function. My implementation so far;
x = 'Some dataset, containing the input and the output'
X = x(:,1:7);
Y = x(:,8);
cvpart = cvpartition(Y,'holdout',0.3);
Xtrain = X(training(cvpart),:);
Ytrain = Y(training(cvpart),:);
Xtest = X(test(cvpart),:);
Ytest = Y(test(cvpart),:);
mdl = GeneralizedLinearModel.fit(Xtrain,Ytrain,'linear','distr','poisson');
Ypred = predict(mdl,Xtest);
res = (Ypred - Ytest);
RMSE_test = sqrt(mean(res.^2));
The code below is for calculating cross validation for mulitple regression as obtained from this link. I want something similar for Generalized Linear Model.
c = cvpartition(Y,'k',10);
regf=@(Xtrain,Ytrain,Xtest)(Xtest*regress(Ytrain,Xtrain));
cvMse = crossval('mse',X,Y,'predfun',regf)
You can either perform the cross-validation process manually (training a model for each fold, predict outcome, compute error, then report the average across all folds), or you can use the CROSSVAL function which wraps this whole procedure in a single call.
To give an example, I will first load and prepare a dataset (a subset of the cars dataset which ships with the Statistics Toolbox):
Option 1
Here we will manually partition the data using k-fold cross-validation using cvpartition (non-stratified). For each fold, we train a GLM model using the training data, then use the model to predict output of testing data. Next we compute and store the regression mean squared error for this fold. At the end, we report the average RMSE across all partitions.
Option 2
Here we can simply call CROSSVAL with an appropriate function handle which computes the regression output given a set of train/test instances. See the doc page to understand the parameters.
You should get a similar result compared to before (slightly different of course, on account of the randomness involved in the cross-validation).