Feature Scaling

What is Feature Scaling?

Feature scaling is the process of normalizing the range of features in a dataset. Since machine learning algorithms use distance calculations, features with larger ranges can dominate those with smaller ranges, even if they're not more important for prediction.

Why Scale Features?

Benefit	Description
Algorithm Performance	Many algorithms like SVM, KNN, and neural networks perform better with scaled features
Convergence Speed	Gradient descent converges faster when features are on similar scales
Equal Importance	Prevents features with larger values from dominating smaller but equally important features
Numerical Stability	Avoids computational issues with very large or small numbers

Common Scaling Techniques

Scaling Methods Comparison

Method	Formula	Output Range	Preserves Distribution	Handles Outliers
StandardScaler	z = (x - μ) / σ	Unbounded	Yes	No
MinMaxScaler	x' = (x - min) / (max - min)	[0, 1]	No	No
MaxAbsScaler	x' = x / max(\|x\|)	[-1, 1]	No	No
RobustScaler	z = (x - median) / IQR	Unbounded	Yes	Yes

StandardScaler (Standardization)

Transforms features to have mean=0 and standard deviation=1 (z-score normalization).

from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Sample data with different scales
data = {
  'height': [165, 180, 175, 160, 185],  # in cm
  'weight': [60, 85, 75, 55, 90],       # in kg
  'age': [25, 30, 35, 40, 45]           # in years
}
df = pd.DataFrame(data)

# Create and apply scaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

# Convert to DataFrame for better display
scaled_df = pd.DataFrame(
  scaled_data,
  columns=df.columns
)

print("Original data:")
print(df)
print("
Scaled data:")
print(scaled_df)
print(f"
Mean: {scaled_df.mean()}")
print(f"Std: {scaled_df.std()}")

Formula:

z = (x - μ) / σ

Where:

z is the standardized value
x is the original value
μ is the mean of the feature
σ is the standard deviation of the feature

MinMaxScaler (Normalization)

Scales features to a specific range, typically [0,1].

from sklearn.preprocessing import MinMaxScaler
import numpy as np
import pandas as pd

# Sample data with different scales
data = {
  'height': [165, 180, 175, 160, 185],  # in cm
  'weight': [60, 85, 75, 55, 90],       # in kg
  'age': [25, 30, 35, 40, 45]           # in years
}
df = pd.DataFrame(data)

# Create and apply scaler
min_max_scaler = MinMaxScaler()
normalized = min_max_scaler.fit_transform(df)

# Convert to DataFrame for better display
normalized_df = pd.DataFrame(
  normalized,
  columns=df.columns
)

print("Original data:")
print(df)
print("
Normalized data:")
print(normalized_df)
print(f"
Min: {normalized_df.min()}")
print(f"Max: {normalized_df.max()}")

Formula:

x_scaled = (x - min(x)) / (max(x) - min(x))

MaxAbsScaler

Scales features by dividing by the maximum absolute value in each feature. Preserves zero values and does not shift/center the data.

from sklearn.preprocessing import MaxAbsScaler
import numpy as np
import pandas as pd

# Sample data with zeros and negative values
data = {
  'feature1': [1, -2, 3, -4, 5],
  'feature2': [0, 10, -10, 20, -20]
}
df = pd.DataFrame(data)

# Create and apply scaler
max_abs_scaler = MaxAbsScaler()
scaled = max_abs_scaler.fit_transform(df)

# Convert to DataFrame for better display
scaled_df = pd.DataFrame(
  scaled,
  columns=df.columns
)

print("Original data:")
print(df)
print("
Max Abs scaled data:")
print(scaled_df)
print(f"
Max absolute values: {max_abs_scaler.max_abs_}")

Formula:

x_scaled = x / max(|x|)

RobustScaler

Uses statistics that are robust to outliers (median and interquartile range).

from sklearn.preprocessing import RobustScaler
import numpy as np
import pandas as pd

# Sample data with outliers
data = {
  'salary': [50000, 55000, 60000, 65000, 500000],  # last value is outlier
  'age': [25, 30, 35, 40, 90]                      # last value is outlier
}
df = pd.DataFrame(data)

# Create and apply scaler
robust_scaler = RobustScaler()
robust_scaled = robust_scaler.fit_transform(df)

# Convert to DataFrame for better display
robust_df = pd.DataFrame(
  robust_scaled,
  columns=df.columns
)

print("Original data:")
print(df)
print("
Robust scaled data:")
print(robust_df)
print(f"
Center (median): {robust_scaler.center_}")
print(f"Scale (IQR): {robust_scaler.scale_}")

Formula:

z = (x - median) / IQR

Where:

IQR is the interquartile range (75th percentile - 25th percentile)

Comparing Scaling Methods

Effect of Scaling on Outliers

Value	Original	StandardScaler	MinMaxScaler	RobustScaler
Normal	1.5	0.13	0.55	0.20
Normal	2.5	0.65	0.75	0.80
Outlier	100	33.2	1.00	24.5

from sklearn.preprocessing import StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate sample data with outliers
np.random.seed(42)
data = np.random.normal(loc=0, scale=1, size=100)
data = np.append(data, [10, -10, 15, -15])  # Add outliers

# Reshape for sklearn
X = data.reshape(-1, 1)

# Apply different scalers
scalers = {
  'Standard': StandardScaler(),
  'MinMax': MinMaxScaler(),
  'MaxAbs': MaxAbsScaler(),
  'Robust': RobustScaler()
}

scaled_data = {}
for name, scaler in scalers.items():
  scaled_data[name] = scaler.fit_transform(X).flatten()

# Create DataFrame for comparison
results = pd.DataFrame({
  'Original': data,
  **scaled_data
})

# Print statistics
print("Data statistics:")
print(results.describe().round(2))

# Print how outliers are handled
print("
Outlier values after scaling:")
outlier_idx = np.abs(data) > 5
print(results.loc[outlier_idx].round(2))

When to Use Each Scaler

Scaler	Best For	Preserves Zero	Handles Outliers	Range
StandardScaler	Normal distributions, PCA, clustering	No	No	Unbounded
MinMaxScaler	Neural networks, algorithms requiring bounded values	No	No	[0, 1] or custom
MaxAbsScaler	Sparse data with zeros	Yes	No	[-1, 1]
RobustScaler	Data with outliers	No	Yes	Unbounded

Algorithm-Specific Recommendations

Algorithm	Recommended Scaler	Reason
Linear Regression	StandardScaler	Assumes normally distributed features
Logistic Regression	StandardScaler	Sensitive to feature scales
SVM	StandardScaler or MinMaxScaler	Distance-based algorithm
Neural Networks	MinMaxScaler	Bounded activation functions work better with [0,1] data
K-means	StandardScaler	Distance-based algorithm
PCA	StandardScaler	Variance-based technique
Decision Trees	No scaling needed	Invariant to feature scales
Random Forest	No scaling needed	Invariant to feature scales

Feature Scaling in Pipelines

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
  data.data, data.target, test_size=0.2, random_state=42
)

# Create pipeline with scaling
pipeline = Pipeline([
  ('scaler', StandardScaler()),
  ('pca', PCA(n_components=5)),
  ('classifier', RandomForestClassifier(random_state=42))
])

# Train pipeline
pipeline.fit(X_train, y_train)

# Evaluate
score = pipeline.score(X_test, y_test)
print(f"Pipeline accuracy: {score:.4f}")

# Access transformed data
X_scaled = pipeline.named_steps['scaler'].transform(X_test[:5])
print("
Scaled data (first 5 samples, first 3 features):")
print(X_scaled[:, :3].round(2))

Common Mistakes with Feature Scaling

Correct vs. Incorrect Scaling Workflow

Step	Correct Approach	Incorrect Approach
1	Split data into train/test	Scale the entire dataset
2	Fit scaler on training data	Split into train/test
3	Transform training data	Train model on scaled data
4	Transform test data using same scaler	Make predictions
5	Train model on scaled training data
6	Make predictions on scaled test data

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np

# Generate data
np.random.seed(42)
X = np.random.normal(loc=0, scale=10, size=(1000, 5))
y = np.random.randint(0, 2, size=1000)

# CORRECT: Split first, then scale
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit scaler on training data only
scaler = StandardScaler()
scaler.fit(X_train)

# Transform both training and test data
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("CORRECT approach:")
print(f"Training data mean: {X_train_scaled.mean(axis=0)[:3].round(3)}")
print(f"Test data mean: {X_test_scaled.mean(axis=0)[:3].round(3)}")

# INCORRECT: Scale before splitting (data leakage)
X_scaled_full = StandardScaler().fit_transform(X)
X_train_wrong, X_test_wrong, _, _ = train_test_split(X_scaled_full, y, test_size=0.2, random_state=42)

print("
INCORRECT approach:")
print(f"Training data mean: {X_train_wrong.mean(axis=0)[:3].round(3)}")
print(f"Test data mean: {X_test_wrong.mean(axis=0)[:3].round(3)}")

2. Forgetting to Scale New Data

When making predictions on new data, you must use the same scaler that was fit on the training data.

3. Scaling Target Variables

In regression, be careful when scaling target variables, as you'll need to inverse transform predictions.

Best Practices

Checklist for Feature Scaling

Task	Description
✓ Analyze your data	Check for outliers and distribution shapes
✓ Choose appropriate scaler	Based on data characteristics and algorithm requirements
✓ Split data first	Always split before scaling to prevent data leakage
✓ Use pipelines	Ensure consistent preprocessing for all data
✓ Save your scaler	Store the fitted scaler for future predictions
✓ Check scaled data	Verify scaling worked as expected
✓ Handle outliers	Consider RobustScaler or removing outliers if appropriate

Feature Scaling

On this page