Introduction
Linear regression is a fundamental technique in machine learning and statistics for modeling the relationship between a dependent variable and one or more independent variables. In this article, we will explore basic and advanced implementations of linear regression, focusing on:
- Basic Linear Regression using
scikit-learn
. - Advanced Linear Regression with additional features such as regularization, optimization algorithms, and polynomial features.
- Highly Advanced Linear Regression incorporating feature selection, early stopping, and performance evaluation.
Basic Linear Regression
Linear regression models the relationship between a target variable y
and one or more feature variables X
by fitting a linear equation to observed data. In this section, we will use scikit-learn
to perform a basic linear regression.
Problem Statement
We want to train a model that represents a linear relationship between a feature matrix X
and a target vector y
.
Solution
Here’s a step-by-step implementation of linear regression using scikit-learn
:
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
# Generate features matrix and target vector
features, target = make_regression(n_samples=100, n_features=3, n_informative=2, n_targets=1, noise=0.2, coef=False, random_state=1)
# Create and fit linear regression model
regression = LinearRegression()
model = regression.fit(features, target)
# View the intercept
print("Intercept:", model.intercept_)
# View the feature coefficients
print("Coefficients:", model.coef_)
# Predict the target value of the first observation
prediction = model.predict(features)[0]
print("Prediction for the first observation:", prediction)
# Evaluate the model
score = model.score(features, target)
print("R2 Score:", score)
Discussion:
In linear regression, the goal is to model the relationship between one dependent variable yyy and one or more independent variables x1, x2, x3,… as a linear function. The general equation for linear regression can be written as:
y = β0+β1x1+β2x2+β3x3+⋯+βnxn+ϵ
Here’s a breakdown of the components:
- y: The target or dependent variable you are trying to predict.
- xi: The feature or independent variables used to predict yyy.
- βi: The coefficients (or weights) that represent the strength and direction of the relationship between each feature xi and the target y. β0 is the intercept, which is the value of y when all xi are zero.
- ϵ: The error term, which accounts for the variability in y that cannot be explained by the linear relationship with the features.
When using a library like scikit-learn in Python for linear regression, you typically interact with these components:
intercept_
: This attribute gives you the value of β0\beta_0β0 (the bias term or intercept) from the trained model.
model.intercept_
coef_
: This attribute provides the coefficients β1,β2,β3,… for each feature in the model
score
: This method returns the R² score of the model, which measures how well the model explains the variability of the target variable. The R² value is the proportion of variance in y that is predictable from the features xi.
model.score(X, y)
Example
Here’s a quick example using scikit-learn in Python:
from sklearn.linear_model import LinearRegression
import numpy as np
# Sample data
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y = np.array([1, 2, 3])
# Create and fit the model
model = LinearRegression()
model.fit(X, y)
# Outputs
print("Intercept (β0):", model.intercept_)
print("Coefficients (β1, β2, β3):", model.coef_)
print("R² Score:", model.score(X, y))
This code will fit a linear model to the data and provide you with the intercept, coefficients, and R² score.
Advanced Linear Regression
We may need to consider regularization, different optimization algorithms, and polynomial features for more complex scenarios. The following implementation provides a customizable linear regression class with these advanced features.
Implementation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
from typing import Literal, Optional, Tuple, List
class AdvancedLinearRegression:
def __init__(self,
learning_rate: float = 0.01,
# How big of a step we take in each update.
n_iterations: int = 1000,
# How many times we update our model.
regularization: Literal['none', 'l1', 'l2', 'elastic_net'] = 'none',
# Type of penalty for large weights.
lambda_reg: float = 0.1,
# How strong the penalty is.
l1_ratio: float = 0.5,
# Mix of L1 and L2 penalties for elastic net.
optimizer: Literal['sgd', 'adam', 'rmsprop'] = 'adam',
# Method to update our model.
polynomial_degree: int = 1,
# Degree of polynomial features to use.
batch_size: Optional[int] = None):
# Size of data batches for updates.
"""
Initialize the AdvancedLinearRegression model with parameters for training.
Parameters:
- learning_rate (float): Determines how much we adjust weights each step.
- n_iterations (int): Number of steps to train the model.
- regularization (Literal): Type of penalty applied to weights ('none', 'l1', 'l2', 'elastic_net').
- lambda_reg (float): Strength of the penalty applied.
- l1_ratio (float): Mix of L1 and L2 penalties for elastic net regularization.
- optimizer (Literal): Method for updating weights ('sgd', 'adam', 'rmsprop').
- polynomial_degree (int): Degree of polynomial features to be used.
- batch_size (Optional[int]): Number of samples in each batch for training.
"""
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.regularization = regularization
self.lambda_reg = lambda_reg
self.l1_ratio = l1_ratio
self.optimizer = optimizer
self.polynomial_degree = polynomial_degree
self.batch_size = batch_size
self.weights = None
self.bias = None
self.scaler = StandardScaler() # Tool to standardize features to zero mean and unit variance
self.poly_features = PolynomialFeatures(degree=polynomial_degree, include_bias=False) # Tool to create polynomial features
def _initialize_parameters(self, n_features: int) -> None:
"""
Initialize model parameters (weights and bias) and setup optimizer-specific variables.
Parameters:
- n_features (int): Number of features in the dataset.
"""
self.weights = np.zeros(n_features)
# Start with weights set to zero
self.bias = 0
# Start with bias set to zero
if self.optimizer in ['adam', 'rmsprop']:
# For Adam and RMSprop optimizers, initialize additional variables
self.m = np.zeros(n_features)
# First moment vector (used in Adam)
self.v = np.zeros(n_features)
# Second moment vector (used in Adam and RMSprop)
self.beta1 = 0.9
# Decay rate for first moment estimate (Adam)
self.beta2 = 0.999
# Decay rate for second moment estimate (Adam)
self.epsilon = 1e-8
# Small number to avoid division by zero (Adam and RMSprop)
self.t = 0
# Time step (used in Adam)
def _compute_gradients(self, X: np.ndarray, y: np.ndarray, y_pred: np.ndarray) -> Tuple[np.ndarray, float]:
"""
Compute gradients for weights and bias, including regularization.
Parameters:
- X (np.ndarray): Input features.
- y (np.ndarray): True target values.
- y_pred (np.ndarray): Predicted target values.
Returns:
- dw (np.ndarray): Gradient of weights.
- db (float): Gradient of bias.
"""
n_samples = X.shape[0]
dw = (1 / n_samples) * np.dot(X.T, (y_pred - y)) # Compute gradient for weights
db = (1 / n_samples) * np.sum(y_pred - y) # Compute gradient for bias
# Add regularization to gradients
if self.regularization == 'l2':
dw += (self.lambda_reg / n_samples) * self.weights
# L2 regularization adds penalty for large weights
elif self.regularization == 'l1':
dw += (self.lambda_reg / n_samples) * np.sign(self.weights)
# L1 regularization adds penalty for any weight size
elif self.regularization == 'elastic_net':
dw += (self.lambda_reg / n_samples)
* (self.l1_ratio * np.sign(self.weights)
+ (1 - self.l1_ratio) * self.weights)
# Elastic Net regularization combines L1 and L2 penalties
return dw, db
def _update_parameters(self, dw: np.ndarray, db: float) -> None:
"""
Update model parameters based on the gradients and optimizer.
Parameters:
- dw (np.ndarray): Gradient of weights.
- db (float): Gradient of bias.
"""
if self.optimizer == 'sgd':
# Update weights and bias using Stochastic Gradient Descent
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
elif self.optimizer == 'adam':
# Update weights using Adam Optimizer
self.t += 1
self.m = self.beta1 * self.m + (1 - self.beta1) * dw
# Update first moment vector
self.v = self.beta2 * self.v + (1 - self.beta2) * (dw ** 2)
# Update second moment vector
m_hat = self.m / (1 - self.beta1 ** self.t)
# Bias correction for first moment
v_hat = self.v / (1 - self.beta2 ** self.t)
# Bias correction for second moment
self.weights -= self.learning_rate * m_hat / (np.sqrt(v_hat) + self.epsilon)
# Update weights
self.bias -= self.learning_rate * db
# Update bias
elif self.optimizer == 'rmsprop':
# Update weights using RMSprop Optimizer
self.v = self.beta2 * self.v + (1 - self.beta2) * (dw ** 2)
# Update second moment vector
self.weights -= self.learning_rate * dw
/ (np.sqrt(self.v) + self.epsilon)
# Update weights
self.bias -= self.learning_rate * db
# Update bias
def fit(self, X: np.ndarray, y: np.ndarray) -> 'AdvancedLinearRegression':
"""
Train the model on the provided data.
Parameters:
- X (np.ndarray): Input features.
- y (np.ndarray): Target values.
Returns:
- self: The trained model.
"""
X_poly = self.poly_features.fit_transform(X)
# Create polynomial features
X_scaled = self.scaler.fit_transform(X_poly)
# Standardize features
n_samples, n_features = X_scaled.shape
self._initialize_parameters(n_features)
# Initialize model parameters
for _ in range(self.n_iterations):
if self.batch_size:
# Use mini-batch gradient descent if batch size is specified
indices = np.random.choice(n_samples, self.batch_size, replace=False)
X_batch = X_scaled[indices]
y_batch = y[indices]
else:
X_batch, y_batch = X_scaled, y
y_pred = np.dot(X_batch, self.weights) + self.bias
# Predict target values
dw, db = self._compute_gradients(X_batch, y_batch, y_pred)
# Compute gradients
self._update_parameters(dw, db)
# Update parameters
return self
def predict(self, X: np.ndarray) -> np.ndarray:
"""
Make predictions using the trained model.
Parameters:
- X (np.ndarray): Input features.
Returns:
- np.ndarray: Predicted target values.
"""
X_poly = self.poly_features.transform(X)
# Transform features to polynomial features
X_scaled = self.scaler.transform(X_poly)
# Standardize features
return np.dot(X_scaled, self.weights) + self.bias
# Predict target values
def score(self, X: np.ndarray, y: np.ndarray) -> float:
"""
Evaluate the model using the R² score.
Parameters:
- X (np.ndarray): Input features.
- y (np.ndarray): True target values.
Returns:
- float: R² score, which measures how well the model fits the data.
"""
y_pred = self.predict(X)
# Predict target values
return r2_score(y, y_pred)
# Compute R² score
def cross_validate(model: AdvancedLinearRegression, X: np.ndarray,
y: np.ndarray, n_splits: int = 5) -> Tuple[float, float]:
"""
Perform k-fold cross-validation to evaluate the model's performance.
Parameters:
- model (AdvancedLinearRegression): The model to evaluate.
- X (np.ndarray): Input features.
- y (np.ndarray): Target values.
- n_splits (int): Number of folds in cross-validation.
Returns:
- Tuple[float, float]: Average R² score and average mean squared error across folds.
"""
kf = KFold(n_splits=n_splits)
# Split data into k folds
r2_scores = []
# List to store R² scores
mse_scores = []
# List to store mean squared error scores
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
# Split data into training and test sets
y_train, y_test = y[train_index], y[test_index]
model.fit(X_train, y_train)
# Train model on training data
y_pred = model.predict(X_test)
# Predict on test data
r2_scores.append(r2_score(y_test, y_pred))
# Compute R² score for this fold
mse_scores.append(mean_squared_error(y_test, y_pred))
# Compute mean squared error for this fold
return np.mean(r2_scores), np.mean(mse_scores)
# Return average R² score and mean squared error
# Example usage
if __name__ == "__main__":
# Generate synthetic dataset
X = np.linspace(0, 10, 100).reshape(-1, 1)
# Create 100 evenly spaced numbers from 0 to 10
y = 2 * X.flatten() + 1 + np.random.normal(0, 1, 100)
# Create target values with some noise
# Initialize and train the model
model = AdvancedLinearRegression(learning_rate=0.01, n_iterations=1000,
regularization='l2', lambda_reg=0.1,
optimizer='adam', polynomial_degree=2)
model.fit(X, y)
# Predict and evaluate the model
y_pred = model.predict(X)
print(f"R² score: {model.score(X, y):.4f}")
# Print R² score of the model
# Cross-validation
avg_r2, avg_mse = cross_validate(model, X, y, n_splits=5)
print(f"Cross-validation R² score: {avg_r2:.4f}")
# Print average R² score from cross-validation
print(f"Cross-validation MSE: {avg_mse:.4f}")
# Print average mean squared error from cross-validation
# Plot results
plt.scatter(X, y, color='blue', label='Actual data')
# Plot actual data points
plt.plot(X, y_pred, color='red', label='Fitted line')
# Plot the fitted line
plt.xlabel('X')
# Label for X-axis
plt.ylabel('y')
# Label for Y-axis
plt.title('Advanced Linear Regression')
# Title of the plot
plt.legend()
# Add legend to the plot
plt.show()
# Display the plot