lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

pythonDataAnalysisAdvancedChapter4Title

pythonDataAnalysisAdvancedChapter1Title

pythonDataAnalysisAdvancedChapter2Title

pythonDataAnalysisAdvancedChapter3Title

# Evaluating Regression Models

`Regression models` are used to **predict continuous numeric values**.
To evaluate how well they perform, we rely on specific metrics that measure prediction accuracy and model fit.

The two most commonly used are:

* `Mean Squared Error (MSE)`: The average of the squared differences between predicted and actual values (the closer to 0, the better)
* `Coefficient of Determination (R²)`: Measures how well the model explains the variance in the target variable (the closer to 1.0, the better)

<br/>

## Formula for `Mean Squared Error`

`Mean Squared Error (MSE)` is calculated as the average of the squared differences between predicted and actual values:

$$
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2
$$

<br/>

## Formula for Coefficient of Determination (`R²`)

The `Coefficient of Determination (R²)` represents how well the model explains the variance of the target variable:

$$
R² = 1 - \frac{MSE}{Var(y)}
$$

<br/>

## Regression Example: `R²` Score

The following example demonstrates how to evaluate a regression model using the `R²` score:

```python title="R² Score Example"
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

# Generate synthetic regression data
import numpy as np
rng = np.random.RandomState(0)
X_reg = 2 * rng.rand(50, 1)
y_reg = 4 + 3 * X_reg.ravel() + rng.randn(50)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)

# Train the model
reg = LinearRegression()
reg.fit(X_train, y_train)

# Make predictions
y_pred = reg.predict(X_test)

# Evaluate the model
r2 = r2_score(y_test, y_pred)
print(f"R² score: {r2:.3f}")
```

Possible values of `R²` are:

* `1.0`: Perfect prediction
* `0`: No improvement over predicting the mean
* Negative: Worse than predicting the mean

<br/>

## Key Takeaways

* Use *classification metrics* for categorical outputs and regression metrics for continuous outputs.
* The most common regression metrics are `Mean Squared Error (MSE)` and `Coefficient of Determination (R²)`.
* **Lower** `MSE` and **higher** `R²` values generally indicate better performance.

A confusion matrix provides a detailed breakdown of correct and incorrect predictions across different classes, allowing you to see where errors occur. This is especially useful for understanding model performance beyond overall accuracy, particularly in imbalanced datasets where accuracy can be misleading.

Evaluating Regression Models

Formula for Mean Squared Error

Formula for Coefficient of Determination (R²)

Regression Example: R² Score

Key Takeaways

What is the primary advantage of using a confusion matrix in evaluating a classification model?

Formula for `Mean Squared Error`

Formula for Coefficient of Determination (`R²`)

Regression Example: `R²` Score