Models/Decision Tree Models
Extra Trees
What are Extra Trees?
Extra Trees (Extremely Randomized Trees) are like random forests, but they add even more randomness when splitting nodes. This can make them faster and sometimes more accurate.
Common uses:
- Predicting categories or numbers with lots of features
- When you want a fast, robust model
Why Use Extra Trees?
- Very fast to train
- Reduces overfitting
- Works for both regression and classification
How Do Extra Trees Work?
- Build many trees on random subsets of the data
- At each split, choose thresholds randomly (not just the best one)
- Combine predictions from all trees
Key Parameters in scikit-learn
| Parameter | Purpose |
|---|---|
n_estimators | Number of trees in the forest |
max_depth | Maximum depth of each tree |
max_features | Number of features to consider at each split |
random_state | Controls randomness for reproducibility |
Step-by-Step Example: Extra Trees for Regression
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Example data
y = np.array([1, 2, 3, 4, 5])
X = np.arange(5).reshape(-1, 1)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Create and fit Extra Trees
et = ExtraTreesRegressor(n_estimators=100, max_depth=3, random_state=0)
et.fit(X_train, y_train)
# Predict and evaluate
preds = et.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Predictions:', preds)
print('MSE:', mse)Explanation:
ExtraTreesRegressor(n_estimators=100, max_depth=3): Builds a forest of 100 extra trees.fit(X_train, y_train): Trains the model.predict(X_test): Makes predictions.
How are Extra Trees Different from Random Forests?
- Splits are chosen completely at random, not just the best split
- Can be faster and sometimes more accurate