Linear Regression

Where to Use Linear Regression

Linear regression is best used when you want to predict a number (like price, temperature, or score) based on one or more features. It works well when the relationship between the input features and the output is roughly a straight line (linear).

Common use cases:

Predicting house prices from features like size and location
Estimating sales based on advertising spend
Forecasting temperature from weather data

Why Use Linear Regression?

Simplicity: Easy to understand and implement
Interpretability: You can see how each feature affects the prediction
Speed: Fast to train, even on large datasets
Baseline: Good starting point before trying more complex models

How to Use Linear Regression

Prepare your data: Make sure your features (X) and target (y) are numbers. Handle missing values and scale features if needed.
Split your data: Use train_test_split to separate training and test sets.
Choose a model: Start with LinearRegression for small/medium data, or SGDRegressor for large data.
Train the model: Call .fit(X_train, y_train).
Make predictions: Use .predict(X_test).
Evaluate: Check how well your model predicts using metrics like mean squared error.

What are the Inputs and Outputs?

Input (X): Table of numbers (features). Each row is a sample, each column is a feature (e.g., size, age, price).
Output (y): A single number for each sample (the value you want to predict).
Prediction: The model outputs a number for each input row, which is its guess for the target value.

How Does Linear Regression Work?

Linear regression finds the best straight line (or hyperplane for many features) that fits your data. It does this by adjusting weights (coefficients) so the line is as close as possible to the real data points.

For one feature, it's a line: y = weight * x + bias
For many features: y = w1*x1 + w2*x2 + ... + bias

The model learns the weights and bias during training. After training, you can use these to understand which features matter most.

What is Linear Regression?

Linear regression is a simple and widely used method for predicting a continuous value based on one or more input features.

Step 1: Create a Baseline with Dummy Regressor

Before building a real model, it's helpful to create a baseline. A Dummy Regressor is a simple model that just predicts the average value from the training data. This helps you check if your real model is actually learning something useful.

import numpy as np
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# 1. Create some example data
y = np.array([1, 2, 3, 4, 5])  # Target values
X = np.arange(5).reshape(-1, 1)  # Features: [[0], [1], [2], [3], [4]]

# 2. Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# 3. Create and fit the dummy regressor
baseline = DummyRegressor(strategy='mean')
baseline.fit(X_train, y_train)

# 4. Predict and evaluate
preds = baseline.predict(X_test)
mse = mean_squared_error(y_test, preds)
print('Baseline predictions:', preds)
print('Baseline MSE:', mse)

Explanation:

DummyRegressor(strategy='mean'): Always predicts the average of the training targets.
fit(X_train, y_train): Learns the mean from the training data.
predict(X_test): Predicts the mean for all test samples.
mean_squared_error(y_test, preds): Measures how far off the predictions are from the real values.

Step 2: Train a Real Linear Regression Model

Now let's train a real model that tries to find the best line through the data.

from sklearn.linear_model import LinearRegression

# 1. Create the model
model = LinearRegression()

# 2. Train the model on the training data
model.fit(X_train, y_train)

# 3. Make predictions on the test data
preds = model.predict(X_test)
print('Predictions:', preds)

Explanation:

LinearRegression(): Makes a model that will try to fit a straight line.
fit(X_train, y_train): Finds the best line using the training data.
predict(X_test): Uses the line to predict values for the test data.

Step 3: Use SGDRegressor for Large Datasets

SGDRegressor is another way to fit a linear model. It uses a method called stochastic gradient descent, which is good for large datasets.

from sklearn.linear_model import SGDRegressor

# 1. Create the SGDRegressor model
sgd = SGDRegressor(max_iter=1000, tol=1e-3, random_state=0)

# 2. Train the model
sgd.fit(X_train, y_train)

# 3. Make predictions
preds = sgd.predict(X_test)
print('SGD Predictions:', preds)

Explanation:

SGDRegressor(max_iter=1000, tol=1e-3): Uses stochastic gradient descent to fit the model. max_iter is the maximum number of passes over the data. tol is the stopping criterion.
fit(X_train, y_train): Trains the model.
predict(X_test): Makes predictions.

Key Parameters of SGDRegressor

Parameter	Purpose
`max_iter`	Maximum number of passes over the data
`tol`	Tolerance for stopping criterion
`learning_rate`	How fast the model updates weights
`penalty`	Regularization (e.g., 'l2', 'l1', or 'elasticnet')
`eta0`	Initial learning rate

Step 4: Accessing Model Weights

After training, you can look at the weights (also called coefficients) and the intercept (the bias or starting value) to see what the model learned.

print('LinearRegression weights:', model.coef_)
print('LinearRegression intercept:', model.intercept_)
print('SGDRegressor weights:', sgd.coef_)
print('SGDRegressor intercept:', sgd.intercept_)

Explanation:

coef_: The weights for each feature. Higher values mean that feature is more important.
intercept_: The bias term. It's the value predicted when all features are zero.

Linear Regression

Where to Use Linear Regression

Why Use Linear Regression?

How to Use Linear Regression

What are the Inputs and Outputs?

How Does Linear Regression Work?

What is Linear Regression?

Step 1: Create a Baseline with Dummy Regressor

Step 2: Train a Real Linear Regression Model

Step 3: Use SGDRegressor for Large Datasets

Key Parameters of SGDRegressor

Step 4: Accessing Model Weights

Visualizing the Linear Regression Process

On this page