Filter Based Feature Selection

What is Filter Based Feature Selection?

Filter methods help you pick the most useful features (columns) in your data before you use any machine learning model. They are fast, simple, and work by checking each feature's relationship with the target.

Example: If a column doesn't change much or isn't related to what you want to predict, you can remove it.

Why Use Filter Methods?

They make your model faster and easier to understand.
They help avoid overfitting (when your model learns noise instead of patterns).

Comparison Table

Method	How it Works	Speed	Uses Model?	Example Techniques
Filter	Uses stats to pick features	Fast	No	Correlation, Variance
Wrapper	Tries feature sets with a model	Slow	Yes	RFE, Forward Selection
Embedded	Picks features during model training	Medium	Yes	Lasso, Decision Trees

Common Filter Methods

Correlation: For numbers. Remove features that don't change with the target.
Variance Threshold: Remove features that are almost always the same value.
Chi-Squared Test: For categories (not shown here).

How Filter Methods Work

Example 1: Correlation Filter (Step-by-Step)

import pandas as pd

# Small example dataset
data = pd.DataFrame({
  'height': [150, 160, 170, 180, 190],
  'weight': [50, 60, 70, 80, 90],
  'shoe_size': [6, 7, 8, 9, 10],
  'random_noise': [1, 1, 1, 1, 1],
  'target': [0, 1, 1, 1, 0]
})

# Step 1: Calculate correlation with the target
correlations = data.corr()['target'].drop('target')
print(correlations)

# Step 2: Keep features with absolute correlation above 0.3
selected = correlations[abs(correlations) > 0.3].index.tolist()
print("Selected features:", selected)

data: Our example table.
data.corr(): Finds how much each column is related to the target.
We keep features with a correlation above 0.3 (you can change this number).

Example 2: Variance Threshold (Step-by-Step)

from sklearn.feature_selection import VarianceThreshold
import pandas as pd

# Same example data as above
data = pd.DataFrame({
  'height': [150, 160, 170, 180, 190],
  'weight': [50, 60, 70, 80, 90],
  'shoe_size': [6, 7, 8, 9, 10],
  'random_noise': [1, 1, 1, 1, 1],
  'target': [0, 1, 1, 1, 0]
})

# Step 1: Remove features with variance below 0.1
selector = VarianceThreshold(threshold=0.1)
selected_data = selector.fit_transform(data.drop('target', axis=1))
selected_columns = data.drop('target', axis=1).columns[selector.get_support()].tolist()
print("Selected features:", selected_columns)

VarianceThreshold: Removes columns that don't change much.
threshold=0.1: Only keep columns that vary more than this.
get_support(): Tells us which columns were kept.

When to Use Filter Methods

When you want a quick, simple way to reduce features.
When you have lots of features.

Filter Based Feature Selection

On this page