XGBoost

What is XGBoost?

XGBoost (Extreme Gradient Boosting) is a fast, powerful, and popular gradient boosting library. It is widely used in machine learning competitions and real-world projects.

Common uses:

Large datasets
Tabular data
Machine learning competitions (Kaggle, etc.)

Why Use XGBoost?

Very fast and efficient
Handles missing values
Supports regularization to reduce overfitting
Works for both regression and classification

Key Parameters

Parameter	Purpose
`n_estimators`	Number of boosting rounds
`learning_rate`	Step size shrinkage
`max_depth`	Maximum depth of a tree
`subsample`	Fraction of samples used per tree
`colsample_bytree`	Fraction of features used per tree

Step-by-Step Example: XGBoost for Regression

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Create and fit XGBoost regressor
xgb_reg = xgb.XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=0)
xgb_reg.fit(X_train, y_train)

# Predict and evaluate
preds = xgb_reg.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)

Explanation:

XGBRegressor: XGBoost's regressor for regression tasks.
fit(X_train, y_train): Trains the model.
predict(X_test): Makes predictions.

When to Use XGBoost

Large datasets
When you need speed and accuracy
When you have missing values or need regularization