HistGradientBoosting (sklearn)

What is HistGradientBoosting?

HistGradientBoosting is a fast, scalable gradient boosting implementation in scikit-learn. It uses histograms to speed up training and can handle missing values natively.

Common uses:

Large tabular datasets
When you want fast, scalable boosting in sklearn

Why Use HistGradientBoosting?

Very fast and scalable
Handles missing values natively
Works for both regression and classification

Key Parameters

Parameter	Purpose
`max_iter`	Number of boosting iterations
`learning_rate`	Step size shrinkage
`max_depth`	Maximum depth of a tree
`max_leaf_nodes`	Maximum number of leaves per tree
`random_state`	Controls randomness for reproducibility

Step-by-Step Example: HistGradientBoosting for Regression

from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Create and fit HistGradientBoosting regressor
hgb_reg = HistGradientBoostingRegressor(max_iter=100, learning_rate=0.1, max_depth=3, random_state=0)
hgb_reg.fit(X_train, y_train)

# Predict and evaluate
preds = hgb_reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)

Explanation:

HistGradientBoostingRegressor: Fast, scalable regressor in sklearn.
fit(X_train, y_train): Trains the model.
predict(X_test): Makes predictions.

When to Use HistGradientBoosting

Large datasets
When you want fast, scalable boosting in sklearn