Models/Decision Tree Models/Gradient Boosting Models
XGBoost
What is XGBoost?
XGBoost (Extreme Gradient Boosting) is a fast, powerful, and popular gradient boosting library. It is widely used in machine learning competitions and real-world projects.
Common uses:
- Large datasets
- Tabular data
- Machine learning competitions (Kaggle, etc.)
Why Use XGBoost?
- Very fast and efficient
- Handles missing values
- Supports regularization to reduce overfitting
- Works for both regression and classification
Key Parameters
| Parameter | Purpose |
|---|---|
n_estimators | Number of boosting rounds |
learning_rate | Step size shrinkage |
max_depth | Maximum depth of a tree |
subsample | Fraction of samples used per tree |
colsample_bytree | Fraction of features used per tree |
Step-by-Step Example: XGBoost for Regression
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Create and fit XGBoost regressor
xgb_reg = xgb.XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=0)
xgb_reg.fit(X_train, y_train)
# Predict and evaluate
preds = xgb_reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)Explanation:
XGBRegressor: XGBoost's regressor for regression tasks.fit(X_train, y_train): Trains the model.predict(X_test): Makes predictions.
When to Use XGBoost
- Large datasets
- When you need speed and accuracy
- When you have missing values or need regularization