Models/Regression
Polynomial Regression
Where to Use Polynomial Regression
Polynomial regression is useful when the relationship between your features and the target is curved or non-linear, but you still want a simple, interpretable model.
Common use cases:
- Predicting growth that accelerates or decelerates (e.g., population, sales)
- Modeling physical phenomena with curves (e.g., projectile motion)
- Fitting data that looks like a curve, not a straight line
Why Use Polynomial Regression?
- Flexibility: Can fit curved data better than linear regression
- Simplicity: Still easy to understand and explain
- Interpretability: You can see how each power of the feature affects the prediction
How to Use Polynomial Regression
- Prepare your data: Make sure your features (X) and target (y) are numbers.
- Transform features: Use
PolynomialFeaturesto add powers of your features (e.g., x, x², x³). - Build a pipeline: Combine feature transformation and regression into one step.
- Train the model: Fit the pipeline to your training data.
- Make predictions: Use the pipeline to predict on new data.
- Evaluate: Check how well your model predicts using metrics like mean squared error.
What are the Inputs and Outputs?
- Input (X): Table of numbers (features). Each row is a sample, each column is a feature.
- Output (y): A single number for each sample (the value you want to predict).
- Prediction: The model outputs a number for each input row, which is its guess for the target value.
How Does Polynomial Regression Work?
Polynomial regression transforms your features by adding powers (like x², x³) and then fits a linear model to these new features. This lets the model fit curves, not just straight lines.
Step 1: Visualize the Problem
Suppose you have data that looks like a curve, not a straight line.
import numpy as np
import matplotlib.pyplot as plt
# Create some curved data
y = np.array([1, 4, 9, 16, 25]) # y = x^2
X = np.arange(1, 6).reshape(-1, 1)
plt.scatter(X, y)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Curved Data Example')
plt.show()Explanation:
- This code creates and plots data where y = x², which is a curve.
Step 2: Transform Features for Polynomial Regression
To fit a curve, we need to add new features: x², x³, etc. PolynomialFeatures does this for us.
from sklearn.preprocessing import PolynomialFeatures
# 1. Create the transformer (degree=2 for x and x^2)
poly = PolynomialFeatures(degree=2, include_bias=False)
# 2. Transform the features
X_poly = poly.fit_transform(X)
print('Transformed features:
', X_poly)Explanation:
PolynomialFeatures(degree=2): Adds x² as a new feature.fit_transform(X): Makes a new table with both x and x² for each sample.
Step 3: Implement Polynomial Regression via Pipeline
A pipeline lets you combine the transformation and regression steps so you don't have to do them separately.
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
# 1. Make a pipeline: transform features, then fit regression
pipeline = make_pipeline(PolynomialFeatures(degree=2, include_bias=False),
LinearRegression())
# 2. Train the pipeline
pipeline.fit(X, y)
# 3. Make predictions
preds = pipeline.predict(X)
print('Predictions:', preds)Explanation:
make_pipeline(...): Chains together the feature transformation and regression steps.fit(X, y): Trains the whole pipeline.predict(X): Makes predictions using the pipeline.
Step 4: Visualize the Fitted Curve
Let's see how well the model fits the data.
plt.scatter(X, y, label='Original Data')
plt.plot(X, preds, color='red', label='Polynomial Fit')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Polynomial Regression Fit')
plt.legend()
plt.show()Explanation:
- This code plots the original data and the curve predicted by the model.