8.6.5. sklearn.ensemble.GradientBoostingClassifier¶
- class sklearn.ensemble.GradientBoostingClassifier(loss='deviance', learn_rate=0.1, n_estimators=100, subsample=1.0, min_samples_split=1, min_samples_leaf=1, max_depth=3, init=None, random_state=None, max_features=None)¶
Gradient Boosting for classification.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.
Parameters : loss : {‘deviance’}, optional (default=’deviance’)
loss function to be optimized. ‘deviance’ refers to deviance (= logistic regression) for classification with probabilistic outputs.
learn_rate : float, optional (default=0.1)
learning rate shrinks the contribution of each tree by learn_rate. There is a trade-off between learn_rate and n_estimators.
n_estimators : int (default=100)
The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
max_depth : integer, optional (default=3)
maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
min_samples_split : integer, optional (default=1)
The minimum number of samples required to split an internal node.
min_samples_leaf : integer, optional (default=1)
The minimum number of samples required to be at a leaf node.
subsample : float, optional (default=1.0)
The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
max_features : int, None, optional (default=None)
The number of features to consider when looking for the best split. Features are choosen randomly at each split point. If None, then max_features=n_features. Choosing max_features < n_features leads to a reduction of variance and an increase in bias.
References
J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
- Friedman, Stochastic Gradient Boosting, 1999
T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.
Examples
>>> samples = [[0, 0, 2], [1, 0, 0]] >>> labels = [0, 1] >>> from sklearn.ensemble import GradientBoostingClassifier >>> gb = GradientBoostingClassifier().fit(samples, labels) >>> print gb.predict([[0.5, 0, 0]]) [0]
Attributes
feature_importances_ array, shape = [n_features] The feature importances (the higher, the more important the feature). oob_score_ array, shape = [n_estimators] Score of the training dataset obtained using an out-of-bag estimate. The i-th score oob_score_[i] is the deviance (= loss) of the model at iteration i on the out-of-bag sample. train_score_ array, shape = [n_estimators] The i-th score train_score_[i] is the deviance (= loss) of the model at iteration i on the in-bag sample. If subsample == 1 this is the deviance on the training data. Methods
decision_function(X) Compute the decision function of X. fit(X, y) Fit the gradient boosting model. fit_stage(i, X, X_argsorted, y, y_pred, ...) Fit another stage of n_classes_ trees to the boosting model. get_params([deep]) Get parameters for the estimator predict(X) Predict class for X. predict_proba(X) Predict class probabilities for X. score(X, y) Returns the mean accuracy on the given test data and labels. set_params(**params) Set the parameters of the estimator. staged_decision_function(X) Compute decision function of X for each iteration. staged_predict(X) Predict class probabilities at each stage for X. staged_predict_proba(X) Predict class probabilities at each stage for X. - __init__(loss='deviance', learn_rate=0.1, n_estimators=100, subsample=1.0, min_samples_split=1, min_samples_leaf=1, max_depth=3, init=None, random_state=None, max_features=None)¶
- decision_function(X)¶
Compute the decision function of X.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : score : array, shape = [n_samples, k]
The decision function of the input samples. Classes are ordered by arithmetical order. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.
- fit(X, y)¶
Fit the gradient boosting model.
Parameters : X : array-like, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features. Use fortran-style to avoid memory copies.
y : array-like, shape = [n_samples]
Target values (integers in classification, real numbers in regression) For classification, labels must correspond to classes 0, 1, ..., n_classes_-1
Returns : self : object
Returns self.
- fit_stage(i, X, X_argsorted, y, y_pred, sample_mask)¶
Fit another stage of n_classes_ trees to the boosting model.
- get_params(deep=True)¶
Get parameters for the estimator
Parameters : deep: boolean, optional :
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- predict(X)¶
Predict class for X.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : y : array of shape = [n_samples]
The predicted classes.
- predict_proba(X)¶
Predict class probabilities for X.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : p : array of shape = [n_samples]
The class probabilities of the input samples. Classes are ordered by arithmetical order.
- score(X, y)¶
Returns the mean accuracy on the given test data and labels.
Parameters : X : array-like, shape = [n_samples, n_features]
Training set.
y : array-like, shape = [n_samples]
Labels for X.
Returns : z : float
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns : self :
- staged_decision_function(X)¶
Compute decision function of X for each iteration.
This method allows monitoring (i.e. determine error on testing set) after each stage.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : score : generator of array, shape = [n_samples, k]
The decision function of the input samples. Classes are ordered by arithmetical order. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.
- staged_predict(X)¶
Predict class probabilities at each stage for X.
This method allows monitoring (i.e. determine error on testing set) after each stage.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : y : array of shape = [n_samples]
The predicted value of the input samples.
- staged_predict_proba(X)¶
Predict class probabilities at each stage for X.
This method allows monitoring (i.e. determine error on testing set) after each stage.
Parameters : X : array-like of shape = [n_samples, n_features]
The input samples.
Returns : y : array of shape = [n_samples]
The predicted value of the input samples.