MLP FU
Models/Decision Tree Models

Extra Trees

What are Extra Trees?

Extra Trees (Extremely Randomized Trees) are like random forests, but they add even more randomness when splitting nodes. This can make them faster and sometimes more accurate.

Common uses:

  • Predicting categories or numbers with lots of features
  • When you want a fast, robust model

Why Use Extra Trees?

  • Very fast to train
  • Reduces overfitting
  • Works for both regression and classification

How Do Extra Trees Work?

  • Build many trees on random subsets of the data
  • At each split, choose thresholds randomly (not just the best one)
  • Combine predictions from all trees

Key Parameters in scikit-learn

ParameterPurpose
n_estimatorsNumber of trees in the forest
max_depthMaximum depth of each tree
max_featuresNumber of features to consider at each split
random_stateControls randomness for reproducibility

Step-by-Step Example: Extra Trees for Regression

from sklearn.ensemble import ExtraTreesRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Create and fit Extra Trees
et = ExtraTreesRegressor(n_estimators=100, max_depth=3, random_state=0)
et.fit(X_train, y_train)

# Predict and evaluate
preds = et.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)

Explanation:

  • ExtraTreesRegressor(n_estimators=100, max_depth=3): Builds a forest of 100 extra trees.
  • fit(X_train, y_train): Trains the model.
  • predict(X_test): Makes predictions.

How are Extra Trees Different from Random Forests?

  • Splits are chosen completely at random, not just the best split
  • Can be faster and sometimes more accurate

Visualizing the Extra Trees Process