MLP FU
Models/Decision Tree Models/Gradient Boosting Models

CatBoost

What is CatBoost?

CatBoost is a gradient boosting library from Yandex that is especially good at handling categorical features automatically.

Common uses:

  • Datasets with many categorical features
  • Tabular data
  • When you want easy model training

Why Use CatBoost?

  • Handles categorical features automatically
  • Works for both regression and classification
  • Good default settings, easy to use

Key Parameters

ParameterPurpose
iterationsNumber of boosting rounds
learning_rateStep size shrinkage
depthMaximum depth of a tree
l2_leaf_regL2 regularization term
random_seedControls randomness for reproducibility

Step-by-Step Example: CatBoost for Regression

from catboost import CatBoostRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Create and fit CatBoost regressor
cat_reg = CatBoostRegressor(iterations=100, learning_rate=0.1, depth=3, random_seed=0, verbose=0)
cat_reg.fit(X_train, y_train)

# Predict and evaluate
preds = cat_reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)

Explanation:

  • CatBoostRegressor: CatBoost's regressor for regression tasks.
  • fit(X_train, y_train): Trains the model.
  • predict(X_test): Makes predictions.

When to Use CatBoost

  • When you have many categorical features
  • When you want easy, robust model training