MLP FU
Scikit-learn/Data Preprocessing

Dimensionality Reduction

What is Dimensionality Reduction?

Dimensionality reduction means making your dataset smaller by reducing the number of features (columns), while keeping as much useful information as possible. This helps with visualization, speeds up training, and can improve model performance.

Why Use Dimensionality Reduction?

  • Datasets with many features can be hard to work with
  • Reduces noise and redundancy
  • Makes data easier to visualize and understand

Example: Principal Component Analysis (PCA)

PCA is a popular method for dimensionality reduction. It finds new features (called principal components) that capture the most variance (information) in your data.

import pandas as pd
from sklearn.decomposition import PCA
import numpy as np

# Sample data with 3 features
X = np.array([
  [2.5, 2.4, 1.2],
  [0.5, 0.7, 0.3],
  [2.2, 2.9, 1.7],
  [1.9, 2.2, 1.1],
  [3.1, 3.0, 2.0],
  [2.3, 2.7, 1.6],
  [2, 1.6, 0.9],
  [1, 1.1, 0.2],
  [1.5, 1.6, 0.7],
  [1.1, 0.9, 0.1]
])

# Reduce to 2 principal components
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

print("Reduced data:", X_reduced)
print("Explained variance ratio:", pca.explained_variance_ratio_)

# Inverse transform: project back to original space
X_original_space = pca.inverse_transform(X_reduced)
print("Back in original space:", X_original_space)
  • import pandas as pd: Loads pandas (not used in this example, but often helpful for real data).
  • from sklearn.decomposition import PCA: Imports PCA for dimensionality reduction.
  • X = np.array([...]): Example dataset with 3 features.
  • pca = PCA(n_components=2): Set up PCA to keep 2 components.
  • pca.fit_transform(X): Reduces the data to 2 dimensions.
  • pca.explained_variance_ratio_: Shows how much information each component keeps.
  • pca.inverse_transform(X_reduced): Projects reduced data back to the original feature space.

Visual: Dimensionality Reduction Process

When to Use

  • Your dataset has many features
  • You want to visualize high-dimensional data
  • You want to remove noise or redundancy

Summary Table

StepWhat it Does
Fit PCAFinds principal components
TransformReduces data to fewer dimensions
Explained VarianceShows info kept by each component
Inverse TransformProjects reduced data back to original space

Dimensionality reduction helps you simplify your data while keeping the most important information.