Models/Decision Tree Models
Random Forest
What is a Random Forest?
A random forest is an ensemble of many decision trees. Each tree makes a prediction, and the forest combines them (by voting or averaging) for a better result.
Common uses:
- Predicting categories (classification) or numbers (regression)
- Handling complex data with many features
- Reducing overfitting compared to a single tree
Why Use Random Forests?
- More accurate than a single tree
- Reduces overfitting
- Works for both regression and classification
How Does a Random Forest Work?
- Builds many trees on random subsets of the data and features
- Combines their predictions (majority vote for classification, average for regression)
Key Parameters in scikit-learn
| Parameter | Purpose |
|---|---|
n_estimators | Number of trees in the forest |
max_depth | Maximum depth of each tree |
max_features | Number of features to consider at each split |
random_state | Controls randomness for reproducibility |
Step-by-Step Example: Random Forest for Regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Create and fit random forest
rf = RandomForestRegressor(n_estimators=100, max_depth=3, random_state=0)
rf.fit(X_train, y_train)
# Predict and evaluate
preds = rf.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)Explanation:
RandomForestRegressor(n_estimators=100, max_depth=3): Builds a forest of 100 trees.fit(X_train, y_train): Trains the forest.predict(X_test): Makes predictions.
How is Random Forest Different from a Single Tree?
- Uses many trees, not just one
- Each tree sees a random subset of data and features
- More robust and less likely to overfit