MLP FU
Models/Decision Tree Models/Gradient Boosting Models

HistGradientBoosting (sklearn)

What is HistGradientBoosting?

HistGradientBoosting is a fast, scalable gradient boosting implementation in scikit-learn. It uses histograms to speed up training and can handle missing values natively.

Common uses:

  • Large tabular datasets
  • When you want fast, scalable boosting in sklearn

Why Use HistGradientBoosting?

  • Very fast and scalable
  • Handles missing values natively
  • Works for both regression and classification

Key Parameters

ParameterPurpose
max_iterNumber of boosting iterations
learning_rateStep size shrinkage
max_depthMaximum depth of a tree
max_leaf_nodesMaximum number of leaves per tree
random_stateControls randomness for reproducibility

Step-by-Step Example: HistGradientBoosting for Regression

from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Create and fit HistGradientBoosting regressor
hgb_reg = HistGradientBoostingRegressor(max_iter=100, learning_rate=0.1, max_depth=3, random_state=0)
hgb_reg.fit(X_train, y_train)

# Predict and evaluate
preds = hgb_reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)

Explanation:

  • HistGradientBoostingRegressor: Fast, scalable regressor in sklearn.
  • fit(X_train, y_train): Trains the model.
  • predict(X_test): Makes predictions.

When to Use HistGradientBoosting

  • Large datasets
  • When you want fast, scalable boosting in sklearn