Models/Decision Tree Models/Gradient Boosting Models
HistGradientBoosting (sklearn)
What is HistGradientBoosting?
HistGradientBoosting is a fast, scalable gradient boosting implementation in scikit-learn. It uses histograms to speed up training and can handle missing values natively.
Common uses:
- Large tabular datasets
- When you want fast, scalable boosting in sklearn
Why Use HistGradientBoosting?
- Very fast and scalable
- Handles missing values natively
- Works for both regression and classification
Key Parameters
| Parameter | Purpose |
|---|---|
max_iter | Number of boosting iterations |
learning_rate | Step size shrinkage |
max_depth | Maximum depth of a tree |
max_leaf_nodes | Maximum number of leaves per tree |
random_state | Controls randomness for reproducibility |
Step-by-Step Example: HistGradientBoosting for Regression
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Create and fit HistGradientBoosting regressor
hgb_reg = HistGradientBoostingRegressor(max_iter=100, learning_rate=0.1, max_depth=3, random_state=0)
hgb_reg.fit(X_train, y_train)
# Predict and evaluate
preds = hgb_reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)Explanation:
HistGradientBoostingRegressor: Fast, scalable regressor in sklearn.fit(X_train, y_train): Trains the model.predict(X_test): Makes predictions.
When to Use HistGradientBoosting
- Large datasets
- When you want fast, scalable boosting in sklearn