Models/Decision Tree Models/Gradient Boosting Models
CatBoost
What is CatBoost?
CatBoost is a gradient boosting library from Yandex that is especially good at handling categorical features automatically.
Common uses:
- Datasets with many categorical features
- Tabular data
- When you want easy model training
Why Use CatBoost?
- Handles categorical features automatically
- Works for both regression and classification
- Good default settings, easy to use
Key Parameters
| Parameter | Purpose |
|---|---|
iterations | Number of boosting rounds |
learning_rate | Step size shrinkage |
depth | Maximum depth of a tree |
l2_leaf_reg | L2 regularization term |
random_seed | Controls randomness for reproducibility |
Step-by-Step Example: CatBoost for Regression
from catboost import CatBoostRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Create and fit CatBoost regressor
cat_reg = CatBoostRegressor(iterations=100, learning_rate=0.1, depth=3, random_seed=0, verbose=0)
cat_reg.fit(X_train, y_train)
# Predict and evaluate
preds = cat_reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)Explanation:
CatBoostRegressor: CatBoost's regressor for regression tasks.fit(X_train, y_train): Trains the model.predict(X_test): Makes predictions.
When to Use CatBoost
- When you have many categorical features
- When you want easy, robust model training