Advanced Linear Regression: Theory and Implementation

Scaibu
8 min readSep 6, 2024

--

Introduction

Linear regression is a fundamental technique in machine learning and statistics for modeling the relationship between a dependent variable and one or more independent variables. In this article, we will explore basic and advanced implementations of linear regression, focusing on:

  1. Basic Linear Regression using scikit-learn.
  2. Advanced Linear Regression with additional features such as regularization, optimization algorithms, and polynomial features.
  3. Highly Advanced Linear Regression incorporating feature selection, early stopping, and performance evaluation.

Basic Linear Regression

Linear regression models the relationship between a target variable y and one or more feature variables X by fitting a linear equation to observed data. In this section, we will use scikit-learn to perform a basic linear regression.

Problem Statement

We want to train a model that represents a linear relationship between a feature matrix X and a target vector y.

Solution

Here’s a step-by-step implementation of linear regression using scikit-learn:

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Generate features matrix and target vector
features, target = make_regression(n_samples=100, n_features=3, n_informative=2, n_targets=1, noise=0.2, coef=False, random_state=1)

# Create and fit linear regression model
regression = LinearRegression()
model = regression.fit(features, target)

# View the intercept
print("Intercept:", model.intercept_)

# View the feature coefficients
print("Coefficients:", model.coef_)

# Predict the target value of the first observation
prediction = model.predict(features)[0]
print("Prediction for the first observation:", prediction)

# Evaluate the model
score = model.score(features, target)
print("R2 Score:", score)

Discussion:

In linear regression, the goal is to model the relationship between one dependent variable yyy and one or more independent variables x1, x2, x3,… as a linear function. The general equation for linear regression can be written as:

y = β0​+β1​x1​+β2​x2​+β3​x3​+⋯+βn​xn​+ϵ

Here’s a breakdown of the components:

  • y: The target or dependent variable you are trying to predict.
  • xi​: The feature or independent variables used to predict yyy.
  • βi: The coefficients (or weights) that represent the strength and direction of the relationship between each feature xi​ and the target y. β0​ is the intercept, which is the value of y when all xi​ are zero.
  • ϵ: The error term, which accounts for the variability in y that cannot be explained by the linear relationship with the features.

When using a library like scikit-learn in Python for linear regression, you typically interact with these components:

  • intercept_: This attribute gives you the value of β0\beta_0β0​ (the bias term or intercept) from the trained model.
model.intercept_

coef_: This attribute provides the coefficients β1,β2,β3​,… for each feature in the model

score: This method returns the R² score of the model, which measures how well the model explains the variability of the target variable. The R² value is the proportion of variance in y that is predictable from the features xi​.

model.score(X, y)

Example

Here’s a quick example using scikit-learn in Python:

from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y = np.array([1, 2, 3])

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Outputs
print("Intercept (β0):", model.intercept_)
print("Coefficients (β1, β2, β3):", model.coef_)
print("R² Score:", model.score(X, y))

This code will fit a linear model to the data and provide you with the intercept, coefficients, and R² score.

Advanced Linear Regression

We may need to consider regularization, different optimization algorithms, and polynomial features for more complex scenarios. The following implementation provides a customizable linear regression class with these advanced features.

Implementation

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
from typing import Literal, Optional, Tuple, List

class AdvancedLinearRegression:
def __init__(self,
learning_rate: float = 0.01,
# How big of a step we take in each update.
n_iterations: int = 1000,
# How many times we update our model.
regularization: Literal['none', 'l1', 'l2', 'elastic_net'] = 'none',
# Type of penalty for large weights.
lambda_reg: float = 0.1,
# How strong the penalty is.
l1_ratio: float = 0.5,
# Mix of L1 and L2 penalties for elastic net.
optimizer: Literal['sgd', 'adam', 'rmsprop'] = 'adam',
# Method to update our model.
polynomial_degree: int = 1,
# Degree of polynomial features to use.
batch_size: Optional[int] = None):
# Size of data batches for updates.
"""
Initialize the AdvancedLinearRegression model with parameters for training.

Parameters:
- learning_rate (float): Determines how much we adjust weights each step.
- n_iterations (int): Number of steps to train the model.
- regularization (Literal): Type of penalty applied to weights ('none', 'l1', 'l2', 'elastic_net').
- lambda_reg (float): Strength of the penalty applied.
- l1_ratio (float): Mix of L1 and L2 penalties for elastic net regularization.
- optimizer (Literal): Method for updating weights ('sgd', 'adam', 'rmsprop').
- polynomial_degree (int): Degree of polynomial features to be used.
- batch_size (Optional[int]): Number of samples in each batch for training.
"""
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.regularization = regularization
self.lambda_reg = lambda_reg
self.l1_ratio = l1_ratio
self.optimizer = optimizer
self.polynomial_degree = polynomial_degree
self.batch_size = batch_size
self.weights = None
self.bias = None
self.scaler = StandardScaler() # Tool to standardize features to zero mean and unit variance
self.poly_features = PolynomialFeatures(degree=polynomial_degree, include_bias=False) # Tool to create polynomial features

def _initialize_parameters(self, n_features: int) -> None:
"""
Initialize model parameters (weights and bias) and setup optimizer-specific variables.

Parameters:
- n_features (int): Number of features in the dataset.
"""
self.weights = np.zeros(n_features)
# Start with weights set to zero
self.bias = 0
# Start with bias set to zero

if self.optimizer in ['adam', 'rmsprop']:
# For Adam and RMSprop optimizers, initialize additional variables
self.m = np.zeros(n_features)
# First moment vector (used in Adam)
self.v = np.zeros(n_features)
# Second moment vector (used in Adam and RMSprop)
self.beta1 = 0.9
# Decay rate for first moment estimate (Adam)
self.beta2 = 0.999
# Decay rate for second moment estimate (Adam)
self.epsilon = 1e-8
# Small number to avoid division by zero (Adam and RMSprop)
self.t = 0
# Time step (used in Adam)

def _compute_gradients(self, X: np.ndarray, y: np.ndarray, y_pred: np.ndarray) -> Tuple[np.ndarray, float]:
"""
Compute gradients for weights and bias, including regularization.

Parameters:
- X (np.ndarray): Input features.
- y (np.ndarray): True target values.
- y_pred (np.ndarray): Predicted target values.

Returns:
- dw (np.ndarray): Gradient of weights.
- db (float): Gradient of bias.
"""
n_samples = X.shape[0]
dw = (1 / n_samples) * np.dot(X.T, (y_pred - y)) # Compute gradient for weights
db = (1 / n_samples) * np.sum(y_pred - y) # Compute gradient for bias

# Add regularization to gradients
if self.regularization == 'l2':
dw += (self.lambda_reg / n_samples) * self.weights
# L2 regularization adds penalty for large weights
elif self.regularization == 'l1':
dw += (self.lambda_reg / n_samples) * np.sign(self.weights)
# L1 regularization adds penalty for any weight size
elif self.regularization == 'elastic_net':
dw += (self.lambda_reg / n_samples)
* (self.l1_ratio * np.sign(self.weights)
+ (1 - self.l1_ratio) * self.weights)
# Elastic Net regularization combines L1 and L2 penalties

return dw, db

def _update_parameters(self, dw: np.ndarray, db: float) -> None:
"""
Update model parameters based on the gradients and optimizer.

Parameters:
- dw (np.ndarray): Gradient of weights.
- db (float): Gradient of bias.
"""
if self.optimizer == 'sgd':
# Update weights and bias using Stochastic Gradient Descent
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
elif self.optimizer == 'adam':
# Update weights using Adam Optimizer
self.t += 1
self.m = self.beta1 * self.m + (1 - self.beta1) * dw
# Update first moment vector
self.v = self.beta2 * self.v + (1 - self.beta2) * (dw ** 2)
# Update second moment vector
m_hat = self.m / (1 - self.beta1 ** self.t)
# Bias correction for first moment
v_hat = self.v / (1 - self.beta2 ** self.t)
# Bias correction for second moment
self.weights -= self.learning_rate * m_hat / (np.sqrt(v_hat) + self.epsilon)
# Update weights
self.bias -= self.learning_rate * db
# Update bias

elif self.optimizer == 'rmsprop':
# Update weights using RMSprop Optimizer
self.v = self.beta2 * self.v + (1 - self.beta2) * (dw ** 2)
# Update second moment vector
self.weights -= self.learning_rate * dw
/ (np.sqrt(self.v) + self.epsilon)
# Update weights
self.bias -= self.learning_rate * db
# Update bias

def fit(self, X: np.ndarray, y: np.ndarray) -> 'AdvancedLinearRegression':
"""
Train the model on the provided data.

Parameters:
- X (np.ndarray): Input features.
- y (np.ndarray): Target values.

Returns:
- self: The trained model.
"""
X_poly = self.poly_features.fit_transform(X)
# Create polynomial features
X_scaled = self.scaler.fit_transform(X_poly)
# Standardize features
n_samples, n_features = X_scaled.shape

self._initialize_parameters(n_features)
# Initialize model parameters

for _ in range(self.n_iterations):
if self.batch_size:
# Use mini-batch gradient descent if batch size is specified
indices = np.random.choice(n_samples, self.batch_size, replace=False)
X_batch = X_scaled[indices]
y_batch = y[indices]
else:
X_batch, y_batch = X_scaled, y

y_pred = np.dot(X_batch, self.weights) + self.bias
# Predict target values
dw, db = self._compute_gradients(X_batch, y_batch, y_pred)
# Compute gradients
self._update_parameters(dw, db)
# Update parameters

return self

def predict(self, X: np.ndarray) -> np.ndarray:
"""
Make predictions using the trained model.

Parameters:
- X (np.ndarray): Input features.

Returns:
- np.ndarray: Predicted target values.
"""
X_poly = self.poly_features.transform(X)
# Transform features to polynomial features
X_scaled = self.scaler.transform(X_poly)
# Standardize features
return np.dot(X_scaled, self.weights) + self.bias
# Predict target values

def score(self, X: np.ndarray, y: np.ndarray) -> float:
"""
Evaluate the model using the R² score.

Parameters:
- X (np.ndarray): Input features.
- y (np.ndarray): True target values.

Returns:
- float: R² score, which measures how well the model fits the data.
"""
y_pred = self.predict(X)
# Predict target values
return r2_score(y, y_pred)
# Compute R² score

def cross_validate(model: AdvancedLinearRegression, X: np.ndarray,
y: np.ndarray, n_splits: int = 5) -> Tuple[float, float]:
"""
Perform k-fold cross-validation to evaluate the model's performance.

Parameters:
- model (AdvancedLinearRegression): The model to evaluate.
- X (np.ndarray): Input features.
- y (np.ndarray): Target values.
- n_splits (int): Number of folds in cross-validation.

Returns:
- Tuple[float, float]: Average R² score and average mean squared error across folds.
"""
kf = KFold(n_splits=n_splits)
# Split data into k folds
r2_scores = []
# List to store R² scores
mse_scores = []
# List to store mean squared error scores

for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
# Split data into training and test sets
y_train, y_test = y[train_index], y[test_index]

model.fit(X_train, y_train)
# Train model on training data
y_pred = model.predict(X_test)
# Predict on test data

r2_scores.append(r2_score(y_test, y_pred))
# Compute R² score for this fold
mse_scores.append(mean_squared_error(y_test, y_pred))
# Compute mean squared error for this fold

return np.mean(r2_scores), np.mean(mse_scores)
# Return average R² score and mean squared error

# Example usage
if __name__ == "__main__":
# Generate synthetic dataset
X = np.linspace(0, 10, 100).reshape(-1, 1)
# Create 100 evenly spaced numbers from 0 to 10
y = 2 * X.flatten() + 1 + np.random.normal(0, 1, 100)
# Create target values with some noise

# Initialize and train the model
model = AdvancedLinearRegression(learning_rate=0.01, n_iterations=1000,
regularization='l2', lambda_reg=0.1,
optimizer='adam', polynomial_degree=2)
model.fit(X, y)

# Predict and evaluate the model
y_pred = model.predict(X)
print(f"R² score: {model.score(X, y):.4f}")
# Print R² score of the model

# Cross-validation
avg_r2, avg_mse = cross_validate(model, X, y, n_splits=5)
print(f"Cross-validation R² score: {avg_r2:.4f}")
# Print average R² score from cross-validation
print(f"Cross-validation MSE: {avg_mse:.4f}")
# Print average mean squared error from cross-validation

# Plot results
plt.scatter(X, y, color='blue', label='Actual data')
# Plot actual data points
plt.plot(X, y_pred, color='red', label='Fitted line')
# Plot the fitted line
plt.xlabel('X')
# Label for X-axis
plt.ylabel('y')
# Label for Y-axis
plt.title('Advanced Linear Regression')
# Title of the plot
plt.legend()
# Add legend to the plot
plt.show()
# Display the plot

--

--

Scaibu

Revolutionize Education with Scaibu: Improving Tech Education and Building Networks with Investors for a Better Future