Wrapper Based Feature Selection

What is Wrapper Based Feature Selection?

Wrapper methods help you find the best set of features by actually testing them with a machine learning model. They try different combinations and keep the ones that work best.

Example: Try using just 'height' and 'weight' to predict something, then try 'height' and 'shoe_size', and so on.

Why Use Wrapper Methods?

They can find the best features for your specific model.
They may take more time, but can give better results.

Comparison Table

Method	How it Works	Speed	Uses Model?	Example Techniques
Filter	Uses stats to pick features	Fast	No	Correlation, Variance
Wrapper	Tries feature sets with a model	Slow	Yes	RFE, Forward Selection
Embedded	Picks features during model training	Medium	Yes	Lasso, Decision Trees

Common Wrapper Methods

Forward Selection: Start with no features, add one at a time.
Backward Elimination: Start with all features, remove one at a time.
Recursive Feature Elimination (RFE): Remove the least important features step by step.

How Wrapper Methods Work

Example 1: Recursive Feature Elimination (RFE)

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Small example dataset
data = pd.DataFrame({
  'height': [150, 160, 170, 180, 190],
  'weight': [50, 60, 70, 80, 90],
  'shoe_size': [6, 7, 8, 9, 10],
  'target': [0, 1, 1, 1, 0]
})

X = data[['height', 'weight', 'shoe_size']]
y = data['target']

# Step 1: Create a model
model = LogisticRegression()

# Step 2: Use RFE to select 2 best features
selector = RFE(model, n_features_to_select=2)
selector = selector.fit(X, y)

# Step 3: Get selected feature names
selected = X.columns[selector.support_].tolist()
print("Selected features:", selected)

RFE: Tries removing features one by one to see which are best.
LogisticRegression(): The model used to test features.
selector.support_: Shows which features were chosen.

Example 2: Forward Selection (Step-by-Step)

from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Same example data as above
data = pd.DataFrame({
  'height': [150, 160, 170, 180, 190],
  'weight': [50, 60, 70, 80, 90],
  'shoe_size': [6, 7, 8, 9, 10],
  'target': [0, 1, 1, 1, 0]
})

X = data[['height', 'weight', 'shoe_size']]
y = data['target']

# Step 1: Create a model
model = LogisticRegression()

# Step 2: Use forward selection to pick 2 best features
selector = SequentialFeatureSelector(model, n_features_to_select=2, direction='forward')
selector = selector.fit(X, y)

# Step 3: Get selected feature names
selected = X.columns[selector.get_support()].tolist()
print("Selected features:", selected)

SequentialFeatureSelector: Adds features one by one to find the best set.
direction='forward': Means we start with no features and add them.
get_support(): Tells us which features were kept.