The Complete Guide to Feature Scaling in Machine Learning

Scaibu
5 min readSep 9, 2024

--

Introduction

Feature scaling is a crucial preprocessing step in machine learning, ensuring that all features contribute equally to the learning process. Without scaling, machine learning algorithms like Principal Component Analysis (PCA), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and neural networks may underperform. This guide will explain why scaling matters, cover common techniques like standardization and min-max scaling, and show their impact on PCA and classification accuracy.

By the end of this guide, you’ll understand:

  • The role of feature scaling in machine learning.
  • Key differences between standardization and min-max scaling.
  • How scaling improves PCA and classification performance.

Why Is Feature Scaling Important in Machine Learning?

Feature scaling ensures that all features in a dataset contribute equally to algorithms, preventing those with larger ranges from dominating the model. This is particularly important in distance-based algorithms like K-Nearest Neighbors (KNN) and Principal Component Analysis (PCA), where unscaled data can lead to poor results.

Example of the Impact of Scaling

Consider a dataset with age values between 0 and 100 and income ranging from 0 to 100,000. If left unscaled, the algorithm will heavily weigh the income feature over age, skewing predictions and reducing the model’s accuracy.

Feature Scaling Techniques

1. Standardization (Z-Score Normalization)

Standardization transforms features so they have a mean of 0 and a standard deviation of 1, making it ideal for algorithms that assume normally distributed data, such as PCA, SVM, and logistic regression.

Python Code Example: Standardization

import pandas as pd
from sklearn.preprocessing import StandardScaler
from typing import List, Tuple

def load_data(url: str, usecols: List[int], column_names: List[str]) -> pd.DataFrame:
"""
Load data from a CSV file with specified columns and column names.

Args:
url (str): URL of the CSV file.
usecols (List[int]): Indices of columns to use.
column_names (List[str]): Names for the selected columns.

Returns:
pd.DataFrame: Loaded dataframe with specified columns.
"""
return pd.read_csv(url, usecols=usecols, names=column_names, header=None)

def standardize_features(df: pd.DataFrame, features: List[str]) -> Tuple[pd.DataFrame, StandardScaler]:
"""
Standardize selected features of a dataframe.

Args:
df (pd.DataFrame): Input dataframe.
features (List[str]): List of feature names to standardize.

Returns:
Tuple[pd.DataFrame, StandardScaler]: DataFrame with standardized features and fitted scaler.
"""
scaler = StandardScaler()
df[features] = scaler.fit_transform(df[features].values) # Optimize by using .values
return df, scaler

def main():
url = 'https://raw.githubusercontent.com/rasbt/pattern_classification/master/data/wine_data.csv'
usecols = [0, 1, 2]
column_names = ['Class label', 'Alcohol', 'Malic acid']
features_to_standardize = ['Alcohol', 'Malic acid']

df = load_data(url, usecols, column_names)
df_std, scaler = standardize_features(df, features_to_standardize)

print("Original data:")
print(df.head())
print("\nStandardized data:")
print(df_std.head())
print(f"\nScaler mean: {scaler.mean_}")
print(f"Scaler variance: {scaler.var_}")

if __name__ == "__main__":
main()

When to Use Standardization

  • When your model relies on normally distributed data (PCA, SVM).
  • When you need equal importance across features, especially in distance-based models like KNN.

2. Min-Max Scaling (Normalization)

Min-max scaling normalizes data to a specific range, often between 0 and 1. It’s commonly applied to datasets that don’t follow a Gaussian distribution, especially in neural networks or image processing.

Python Code Example: Min-Max Scaling

from sklearn.preprocessing import MinMaxScaler

# Define features for scaling
features = ['Alcohol', 'Malic acid']

# Initialize and fit the Min-Max Scaler
minmax_scaler = MinMaxScaler()
df_minmax = minmax_scaler.fit_transform(df[features])

# Print the scaled features
print(df_minmax)

When to Use Min-Max Scaling

  • For non-Gaussian distributions.
  • This is ideal for neural networks, where the features need to be on a similar scale to speed up convergence.

Visualizing Feature Scaling

To fully grasp the effects of standardization and min-max scaling, visualizing the transformation helps.

import matplotlib.pyplot as plt

def plot_scaling(df, df_std, df_minmax):
"""
Plot the original, standardized, and Min-Max scaled features for comparison.

Parameters:
df (DataFrame): Original DataFrame with 'Alcohol' and 'Malic acid'.
df_std (array): Standardized feature values.
df_minmax (array): Min-Max scaled feature values.
"""
plt.figure(figsize=(8, 6))

# Plot original scale
plt.scatter(df['Alcohol'], df['Malic acid'], color='green', label='Original Scale', alpha=0.5)

# Plot standardized scale
plt.scatter(df_std[:, 0], df_std[:, 1], color='red', label='Standardized', alpha=0.5)

# Plot Min-Max scaled
plt.scatter(df_minmax[:, 0], df_minmax[:, 1], color='blue', label='Min-Max Scaled', alpha=0.5)

plt.title('Feature Scaling on Wine Dataset')
plt.xlabel('Alcohol')
plt.ylabel('Malic Acid')
plt.legend(loc='upper left')
plt.grid()
plt.show()

# Call the function with appropriate DataFrame and scaled data
plot_scaling(df, df_std, df_minmax)

The Role of Feature Scaling in PCA

What Is PCA?

Principal Component Analysis (PCA) is a dimensionality reduction technique that projects data onto new axes (principal components) based on maximum variance. However, PCA is sensitive to feature scaling — without it, larger features dominate the principal components, reducing PCA’s effectiveness.

Example: PCA with and Without Standardization

Let’s see how PCA performs on both standardized and non-standardized data.

from sklearn.decomposition import PCA

# Perform PCA on non-standardized data
pca = PCA(n_components=2)
df_pca = pca.fit_transform(df[['Alcohol', 'Malic acid']])
# Perform PCA on standardized data
pca_std = PCA(n_components=2)
df_std_pca = pca_std.fit_transform(df_std)

# Print PCA results for non-standardized and standardized data
print("PCA on non-standardized data:\n", df_pca)
print("PCA on standardized data:\n", df_std_pca)

Visualizing PCA with and without Standardization

import matplotlib.pyplot as plt

# Create subplots for PCA visualizations
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10, 5))

# Plot PCA results on non-standardized data
ax1.scatter(df_pca[:, 0], df_pca[:, 1], color='blue', alpha=0.5)
ax1.set_title('PCA on Non-Standardized Data')
ax1.set_xlabel('1st Principal Component')
ax1.set_ylabel('2nd Principal Component')

# Plot PCA results on standardized data
ax2.scatter(df_std_pca[:, 0], df_std_pca[:, 1], color='red', alpha=0.5)
ax2.set_title('PCA on Standardized Data')
ax2.set_xlabel('1st Principal Component')
ax2.set_ylabel('2nd Principal Component')

# Adjust layout for better fit
plt.tight_layout()
plt.show()

Key Observations:

  • Without scaling, the first principal component is dominated by features with larger ranges, making PCA ineffective.
  • With scaling, all features contribute equally, allowing PCA to better capture the structure of the data.

Classifier Performance After PCA: Naive Bayes Example

To demonstrate the effect of scaling on classification, we’ll use a Naive Bayes classifier on PCA-transformed data.

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Train Naive Bayes on non-standardized PCA data
gnb = GaussianNB()
gnb.fit(df_pca, df['Class label'])
predictions_pca = gnb.predict(df_pca)

# Train Naive Bayes on standardized PCA data
gnb_std = GaussianNB()
gnb_std.fit(df_std_pca, df['Class label'])
predictions_std_pca = gnb_std.predict(df_std_pca)

# Compare and print accuracy scores
print("Accuracy on Non-Standardized PCA:", accuracy_score(df['Class label'], predictions_pca))
print("Accuracy on Standardized PCA:", accuracy_score(df['Class label'], predictions_std_pca))

Results:

  • Without scaling: The classifier accuracy is significantly lower, as PCA is unable to effectively reduce dimensionality.
  • With scaling: PCA performs better, resulting in higher classifier accuracy.

Conclusion

Feature scaling plays a pivotal role in machine learning, especially in algorithms sensitive to the scale of input features, like PCA and KNN. Techniques like standardization and min-max scaling ensure that all features contribute equally, preventing skewed model performance.

Key Takeaways:

  • Standardization is crucial for models that rely on normally distributed data.
  • Min-max scaling is best for models like neural networks and datasets with non-Gaussian distributions.
  • Scaling significantly improves PCA performance and overall classification accuracy.

By understanding and applying feature scaling techniques, you can enhance your machine learning model’s performance in real-world scenarios.

--

--

Scaibu
Scaibu

Written by Scaibu

Revolutionize Education with Scaibu: Improving Tech Education and Building Networks with Investors for a Better Future

No responses yet