How to graph grid scores from GridSearchCV?

I am looking for a way to graph grid_scores_ from GridSearchCV in sklearn. In this example I am trying to grid search for best gamma and C parameters for an SVR algorithm. My code looks as follows:

    C_range = 10.0 ** np.arange(-4, 4)
    gamma_range = 10.0 ** np.arange(-4, 4)
    param_grid = dict(gamma=gamma_range.tolist(), C=C_range.tolist())
    grid = GridSearchCV(SVR(kernel='rbf', gamma=0.1),param_grid, cv=5)
    grid.fit(X_train,y_train)
    print(grid.grid_scores_)

After I run the code and print the grid scores I get the following outcome:

[mean: -3.28593, std: 1.69134, params: {'gamma': 0.0001, 'C': 0.0001}, mean: -3.29370, std: 1.69346, params: {'gamma': 0.001, 'C': 0.0001}, mean: -3.28933, std: 1.69104, params: {'gamma': 0.01, 'C': 0.0001}, mean: -3.28925, std: 1.69106, params: {'gamma': 0.1, 'C': 0.0001}, mean: -3.28925, std: 1.69106, params: {'gamma': 1.0, 'C': 0.0001}, mean: -3.28925, std: 1.69106, params: {'gamma': 10.0, 'C': 0.0001},etc]

I would like to visualize all the scores (mean values) depending on gamma and C parameters. The graph I am trying to obtain should look as follows:

Where x-axis is gamma, y-axis is mean score (root mean square error in this case), and different lines represent different C values.

标签： python machine-learning scikit-learn grid-search

10条回答

成全新的幸福

2楼-- · 2020-05-13 20:49

For plotting the results when tuning several hyperparameters, what I did was fixed all parameters to their best value except for one and plotted the mean score for the other parameter for each of its values.

def plot_search_results(grid):
    """
    Params: 
        grid: A trained GridSearchCV object.
    """
    ## Results from grid search
    results = grid.cv_results_
    means_test = results['mean_test_score']
    stds_test = results['std_test_score']
    means_train = results['mean_train_score']
    stds_train = results['std_train_score']

    ## Getting indexes of values per hyper-parameter
    masks=[]
    masks_names= list(grid.best_params_.keys())
    for p_k, p_v in grid.best_params_.items():
        masks.append(list(results['param_'+p_k].data==p_v))

    params=grid.param_grid

    ## Ploting results
    fig, ax = plt.subplots(1,len(params),sharex='none', sharey='all',figsize=(20,5))
    fig.suptitle('Score per parameter')
    fig.text(0.04, 0.5, 'MEAN SCORE', va='center', rotation='vertical')
    pram_preformace_in_best = {}
    for i, p in enumerate(masks_names):
        m = np.stack(masks[:i] + masks[i+1:])
        pram_preformace_in_best
        best_parms_mask = m.all(axis=0)
        best_index = np.where(best_parms_mask)[0]
        x = np.array(params[p])
        y_1 = np.array(means_test[best_index])
        e_1 = np.array(stds_test[best_index])
        y_2 = np.array(means_train[best_index])
        e_2 = np.array(stds_train[best_index])
        ax[i].errorbar(x, y_1, e_1, linestyle='--', marker='o', label='test')
        ax[i].errorbar(x, y_2, e_2, linestyle='-', marker='^',label='train' )
        ax[i].set_xlabel(p.upper())

    plt.legend()
    plt.show()

Result

1人赞添加讨论(0) 举报

倾城　Initia

3楼-- · 2020-05-13 20:43

I used grid search on xgboost with different learning rates, max depths and number of estimators.

gs_param_grid = {'max_depth': [3,4,5], 
                 'n_estimators' : [x for x in range(3000,5000,250)],
                 'learning_rate':[0.01,0.03,0.1]
                }
gbm = XGBRegressor()
grid_gbm = GridSearchCV(estimator=gbm, 
                        param_grid=gs_param_grid, 
                        scoring='neg_mean_squared_error', 
                        cv=4, 
                        verbose=1
                       )
grid_gbm.fit(X_train,y_train)

To create the graph for error vs number of estimators with different learning rates, I used the following approach:

y=[]
cvres = grid_gbm.cv_results_
best_md=grid_gbm.best_params_['max_depth']
la=gs_param_grid['learning_rate']
n_estimators=gs_param_grid['n_estimators']

for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
    if params["max_depth"]==best_md:
        y.append(np.sqrt(-mean_score))


y=np.array(y).reshape(len(la),len(n_estimators))

%matplotlib inline
plt.figure(figsize=(8,8))
for y_arr, label in zip(y, la):
    plt.plot(n_estimators, y_arr, label=label)

plt.title('Error for different learning rates(keeping max_depth=%d(best_param))'%best_md)
plt.legend()
plt.xlabel('n_estimators')
plt.ylabel('Error')
plt.show()

The plot can be viewed here: Result

Note that the graph can similarly be created for error vs number of estimators with different max depth (or any other parameters as per the user's case).

0人赞添加讨论(0) 举报

闹够了就滚

4楼-- · 2020-05-13 20:46

@nathandrake Try the following which is adapted based off the code from @david-alvarez :

def plot_grid_search(cv_results, metric, grid_param_1, grid_param_2, name_param_1, name_param_2):
    # Get Test Scores Mean and std for each grid search
    scores_mean = cv_results[('mean_test_' + metric)]
    scores_sd = cv_results[('std_test_' + metric)]

    if grid_param_2 is not None:
        scores_mean = np.array(scores_mean).reshape(len(grid_param_2),len(grid_param_1))
        scores_sd = np.array(scores_sd).reshape(len(grid_param_2),len(grid_param_1))

    # Set plot style
    plt.style.use('seaborn')

    # Plot Grid search scores
    _, ax = plt.subplots(1,1)

    if grid_param_2 is not None:
        # Param1 is the X-axis, Param 2 is represented as a different curve (color line)
        for idx, val in enumerate(grid_param_2):
            ax.plot(grid_param_1, scores_mean[idx,:], '-o', label= name_param_2 + ': ' + str(val))
    else:
        # If only one Param1 is given
        ax.plot(grid_param_1, scores_mean, '-o')

    ax.set_title("Grid Search", fontsize=20, fontweight='normal')
    ax.set_xlabel(name_param_1, fontsize=16)
    ax.set_ylabel('CV Average ' + str.capitalize(metric), fontsize=16)
    ax.legend(loc="best", fontsize=15)
    ax.grid('on')

As you can see, I added the ability to support grid searches that include multiple metrics. You simply specify the metric you want to plot in the call to the plotting function.

Also, if your grid search only tuned a single parameter you can simply specify None for grid_param_2 and name_param_2.

Call it as follows:

plot_grid_search(grid_search.cv_results_,
                 'Accuracy',
                 list(np.linspace(0.001, 10, 50)), 
                 ['linear', 'rbf'],
                 'C',
                 'kernel')

0人赞添加讨论(0) 举报

地球回转人心会变

5楼-- · 2020-05-13 20:54

from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn import datasets
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

digits = datasets.load_digits()
X = digits.data
y = digits.target

clf_ = SVC(kernel='rbf')
Cs = [1, 10, 100, 1000]
Gammas = [1e-3, 1e-4]
clf = GridSearchCV(clf_,
            dict(C=Cs,
                 gamma=Gammas),
                 cv=2,
                 pre_dispatch='1*n_jobs',
                 n_jobs=1)

clf.fit(X, y)

scores = [x[1] for x in clf.grid_scores_]
scores = np.array(scores).reshape(len(Cs), len(Gammas))

for ind, i in enumerate(Cs):
    plt.plot(Gammas, scores[ind], label='C: ' + str(i))
plt.legend()
plt.xlabel('Gamma')
plt.ylabel('Mean score')
plt.show()

Code is based on this.
Only puzzling part: will sklearn always respect the order of C & Gamma -> official example uses this "ordering"

Output:

0人赞添加讨论(0) 举报

劫难

6楼-- · 2020-05-13 20:55

The order that the parameter grid is traversed is deterministic, such that it can be reshaped and plotted straightforwardly. Something like this:

scores = [entry.mean_validation_score for entry in grid.grid_scores_]
# the shape is according to the alphabetical order of the parameters in the grid
scores = np.array(scores).reshape(len(C_range), len(gamma_range))
for c_scores in scores:
    plt.plot(gamma_range, c_scores, '-')

0人赞添加讨论(0) 举报

How to graph grid scores from GridSearchCV?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间