Practical guides of gradient boosting with scikit-learn
This article ordinary and histogram-based gradient boosting algorithms in scikit-learn.
For a conceptual introduction to gradient boosting, please visit our article here.
Introduction
In essence, scikit-learn offers two options for gradient-boosted trees:
- Gradient boosting
- Histogram-based gradient boosting
For gradient boosting, one can access the class:
GradientBoostingClassifier
GradientBoostingRegressor
while for histogram-based gradient boosting, one can use:
# Histogram-based gradient boosting
HistGradientBoostingClassifier
HistGradientBoostingRegressor
Each method will be explained in detail below in separate sessions.
It is useful to note the pros and cons when using each of the options.
In terms of speed, histogram-based gradient boosting is generally faster than gradient boosting as the node splitting complexity of gradient boosting is proportional to the number of data points and the number of features.
In terms of accuracy, as the node splitting procedure is coarser in histogram-based gradient boosting, gradient boosting generally could develop a more complex model as the node splitting is finer.
The histogram-based implementation in scikit-learn supports handling of missing values and categorical data, making upstream data preprocessing steps a bit easier.
Gradient boosting
The class GradientBoostingClassifier
and GradientBoostingRegressor
implemented in scikit-learn use the classic gradient boosting algorithm described in our article here.
In particular, at each boosting round, a regression tree DecisionTreeRegressor
is fitted using the negative gradients of the loss function as target labels (the gradients are computed at prediction values from the previous iterations). The objective of the regression tree is either mean square error (MSE) or MSE with improvement score by Friedman. This objective is also used to compute the quality of each split.
It is important to note that other implementations such as LightGBM or XGBoost have different tree fitting procedures.
These two classes are relatively slow when the training dataset is large, as each node split is determined by looking at all possible split for all features.
Histogram-based gradient boosting
Inspired by LightGBM, scikit-learn also implemented histogram-based gradient boosting.
During tree fitting, the training samples are first binned into integer-valued histograms (the default value for the number of bins is 256). Then for each node splitting these histograms are used instead of considering all possible splits for each features. This usually speeds up training time as there is no need to sort dataset for each feature for each boosting round.
Also like LightGBM, this class also support handling of missing values. In particular, during training, samples with missing values would be assigned to the node (left node or right node) which maximises the potential gain from the node splitting. During inference, the samples with missing values will be assigned to the same node as in the training accordingly. If no missing values are encountered during training, the samples with missing values will be assigned to the node with more samples.
Summary
In conclusion, this article explored the concept of boosting algorithms within the scikit-learn library.
We reviewed the foundations of boosting, a technique for creating powerful ensembles by sequentially training weak learners. We then delved deeper into gradient boosting, a popular boosting method that builds an additive model by fitting decision trees on the negative gradients of a loss function.
Finally, we introduced histogram-based gradient boosting, a faster variant particularly suitable for larger datasets.
By understanding these boosting techniques in scikit-learn, you can leverage their ensemble power to enhance the accuracy and robustness of your machine learning models!
🚀 Subscribe to us @ newsletter.verticalsolution.io