5.2. Grid Search: setting estimator parameters

Grid Search is used to optimize the parameters of a model (e.g. C, kernel and gamma for Support Vector Classifier, alpha for Lasso, etc.) using an internal Cross-Validation: evaluating estimator performance scheme).

5.2.1. GridSearchCV

The main class for implementing hyperparameters grid search in scikit-learn is grid_search.GridSearchCV. This class is passed a base model instance (for example sklearn.svm.SVC()) along with a grid of potential hyper-parameter values such as:

[{'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 {'C': [1, 10, 100, 1000], 'kernel': ['linear']}]

The grid_search.GridSearchCV instance implements the usual estimator API: when “fitting” it on a dataset all the possible combinations of hyperparameter values are evaluated and the best combinations is retained.

Model selection: development and evaluation

Model selection with GridSearchCV can be seen as a way to use the labeled data to “train” the hyper-parameters of the grid.

When evaluating the resulting model it is important to do it on held-out samples that were not seen during the grid search process: it is recommended to split the data into a development set (to be fed to the GridSearchCV instance) and an evaluation set to compute performance metrics.

This can be done by using the cross_validation.train_test_split utility function.

5.2.2. Examples

Note

Computations can be run in parallel if your OS supports it, by using the keyword n_jobs=-1, see function signature for more details.

Previous
Next