MLP FU
Models/Decision Tree Models

Random Forest

What is a Random Forest?

A random forest is an ensemble of many decision trees. Each tree makes a prediction, and the forest combines them (by voting or averaging) for a better result.

Common uses:

  • Predicting categories (classification) or numbers (regression)
  • Handling complex data with many features
  • Reducing overfitting compared to a single tree

Why Use Random Forests?

  • More accurate than a single tree
  • Reduces overfitting
  • Works for both regression and classification

How Does a Random Forest Work?

  • Builds many trees on random subsets of the data and features
  • Combines their predictions (majority vote for classification, average for regression)

Key Parameters in scikit-learn

ParameterPurpose
n_estimatorsNumber of trees in the forest
max_depthMaximum depth of each tree
max_featuresNumber of features to consider at each split
random_stateControls randomness for reproducibility

Step-by-Step Example: Random Forest for Regression

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Create and fit random forest
rf = RandomForestRegressor(n_estimators=100, max_depth=3, random_state=0)
rf.fit(X_train, y_train)

# Predict and evaluate
preds = rf.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)

Explanation:

  • RandomForestRegressor(n_estimators=100, max_depth=3): Builds a forest of 100 trees.
  • fit(X_train, y_train): Trains the forest.
  • predict(X_test): Makes predictions.

How is Random Forest Different from a Single Tree?

  • Uses many trees, not just one
  • Each tree sees a random subset of data and features
  • More robust and less likely to overfit

Visualizing the Random Forest Process