MLP FU
Scikit-learn/Data Preprocessing

Heterogeneous Transformation

What is Heterogeneous Transformation?

Heterogeneous transformation means applying different preprocessing steps to different types of features in your dataset. For example, you might want to scale numerical features and one-hot encode categorical features at the same time.

This is useful when your data has both numbers and categories, and each needs a different kind of transformation.

Why Use Heterogeneous Transformation?

  • Real-world data often has mixed types (numbers, categories, etc.)
  • Each type needs its own preprocessing
  • Makes your pipeline flexible and organized

Example: Using ColumnTransformer

ColumnTransformer from scikit-learn helps you apply different transformations to different columns.

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

# Sample data with numerical and categorical features
data = {
  'age': [25, 32, 47, 51],
  'salary': [50000, 60000, 80000, 90000],
  'city': ['Paris', 'London', 'Paris', 'Berlin']
}
df = pd.DataFrame(data)

# Define which columns get which transformer
preprocessor = ColumnTransformer(
  transformers=[
      ('num', StandardScaler(), ['age', 'salary']),
      ('cat', OneHotEncoder(), ['city'])
  ]
)

# Fit and transform the data
transformed = preprocessor.fit_transform(df)
print(transformed)
  • import pandas as pd: Loads pandas for data tables.
  • from sklearn.compose import ColumnTransformer: Lets you apply different transformers to columns.
  • from sklearn.preprocessing import StandardScaler, OneHotEncoder: Tools for scaling numbers and encoding categories.
  • data = {...}: Example data with numbers and categories.
  • preprocessor = ColumnTransformer(...): Sets up which columns get which transformation.
  • preprocessor.fit_transform(df): Applies the transformations.

Visual: How ColumnTransformer Works

When to Use

  • Your dataset has both numerical and categorical features
  • You want to keep preprocessing organized and reproducible

Summary Table

Feature TypeExample ColumnsTransformation
Numericalage, salaryStandardScaler
CategoricalcityOneHotEncoder

Heterogeneous transformation helps you handle real-world data easily and keeps your preprocessing clean and clear.