MLP FU
Scikit-learn/Data Preprocessing/Feature Selection

Wrapper Based Feature Selection

What is Wrapper Based Feature Selection?

Wrapper methods help you find the best set of features by actually testing them with a machine learning model. They try different combinations and keep the ones that work best.

Example: Try using just 'height' and 'weight' to predict something, then try 'height' and 'shoe_size', and so on.

Why Use Wrapper Methods?

  • They can find the best features for your specific model.
  • They may take more time, but can give better results.

Comparison Table

MethodHow it WorksSpeedUses Model?Example Techniques
FilterUses stats to pick featuresFastNoCorrelation, Variance
WrapperTries feature sets with a modelSlowYesRFE, Forward Selection
EmbeddedPicks features during model trainingMediumYesLasso, Decision Trees

Common Wrapper Methods

  • Forward Selection: Start with no features, add one at a time.
  • Backward Elimination: Start with all features, remove one at a time.
  • Recursive Feature Elimination (RFE): Remove the least important features step by step.

How Wrapper Methods Work

Example 1: Recursive Feature Elimination (RFE)

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Small example dataset
data = pd.DataFrame({
  'height': [150, 160, 170, 180, 190],
  'weight': [50, 60, 70, 80, 90],
  'shoe_size': [6, 7, 8, 9, 10],
  'target': [0, 1, 1, 1, 0]
})

X = data[['height', 'weight', 'shoe_size']]
y = data['target']

# Step 1: Create a model
model = LogisticRegression()

# Step 2: Use RFE to select 2 best features
selector = RFE(model, n_features_to_select=2)
selector = selector.fit(X, y)

# Step 3: Get selected feature names
selected = X.columns[selector.support_].tolist()
print("Selected features:", selected)
  • RFE: Tries removing features one by one to see which are best.
  • LogisticRegression(): The model used to test features.
  • selector.support_: Shows which features were chosen.

Example 2: Forward Selection (Step-by-Step)

from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Same example data as above
data = pd.DataFrame({
  'height': [150, 160, 170, 180, 190],
  'weight': [50, 60, 70, 80, 90],
  'shoe_size': [6, 7, 8, 9, 10],
  'target': [0, 1, 1, 1, 0]
})

X = data[['height', 'weight', 'shoe_size']]
y = data['target']

# Step 1: Create a model
model = LogisticRegression()

# Step 2: Use forward selection to pick 2 best features
selector = SequentialFeatureSelector(model, n_features_to_select=2, direction='forward')
selector = selector.fit(X, y)

# Step 3: Get selected feature names
selected = X.columns[selector.get_support()].tolist()
print("Selected features:", selected)
  • SequentialFeatureSelector: Adds features one by one to find the best set.
  • direction='forward': Means we start with no features and add them.
  • get_support(): Tells us which features were kept.