lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

pythonDataAnalysisAdvancedChapter4Title

pythonDataAnalysisAdvancedChapter1Title

pythonDataAnalysisAdvancedChapter2Title

pythonDataAnalysisAdvancedChapter3Title

# Feature Scaling and Preprocessing

In machine learning, *feature scaling* and *data preprocessing* ensure that all input features contribute equally to the model and are properly formatted for learning.

Without scaling, algorithms like `KNN` or `gradient descent-based` models can become biased toward features with larger numerical ranges.

<br/>

## Common Preprocessing Steps

- `Feature Scaling`: Normalize or standardize values so they're on a similar scale.
- `Encoding Categorical Variables`: Convert text labels into numbers.
- `Handling Missing Values`: Replace or remove nulls.
- `Feature Transformation`: Apply mathematical transformations (log, polynomial, etc.).

<br/>

## Example: Standardization and Normalization

Let’s see how to apply *standardization* and *normalization* using `Scikit-learn`’s preprocessing tools.

```python title="Scaling Features in Scikit-learn"
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Example dataset
X = np.array([[1.0, 200.0],
              [2.0, 300.0],
              [3.0, 400.0]])

# Standardization (mean=0, std=1)
scaler_std = StandardScaler()
X_std = scaler_std.fit_transform(X)

# Normalization (range [0, 1])
scaler_mm = MinMaxScaler()
X_mm = scaler_mm.fit_transform(X)

print("Standardized Data:", X_std)
print("Min-Max Scaled Data:", X_mm)
```

<br/>

## Choosing the Right Scaling Method

The following are the two most common scaling methods:

* `Standardization` — best for algorithms that assume normally distributed features (e.g., logistic regression, SVM).
* `Normalization` — best for distance-based or gradient-sensitive models (e.g., KNN, neural networks).

Feature scaling is crucial because it ensures that all features contribute equally to the model. Without it, features with larger numeric ranges can dominate the learning process, leading to biased predictions. Applying techniques like normalization or standardization helps maintain balance across different feature scales.

### What is a key reason for applying feature scaling in machine learning models?