Lecture

Evaluating Regression Models

Regression models are used to predict continuous numeric values. To evaluate how well they perform, we rely on specific metrics that measure prediction accuracy and model fit.

The two most commonly used are:

  • Mean Squared Error (MSE): The average of the squared differences between predicted and actual values (the closer to 0, the better)
  • Coefficient of Determination (R²): Measures how well the model explains the variance in the target variable (the closer to 1.0, the better)

Formula for Mean Squared Error

Mean Squared Error (MSE) is calculated as the average of the squared differences between predicted and actual values:

MSE=1ni=1n(yiyi^)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2

Formula for Coefficient of Determination ()

The Coefficient of Determination (R²) represents how well the model explains the variance of the target variable:

R2=1MSEVar(y)R² = 1 - \frac{MSE}{Var(y)}

Regression Example: Score

The following example demonstrates how to evaluate a regression model using the score:

R² Score Example
from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score from sklearn.model_selection import train_test_split # Generate synthetic regression data import numpy as np rng = np.random.RandomState(0) X_reg = 2 * rng.rand(50, 1) y_reg = 4 + 3 * X_reg.ravel() + rng.randn(50) # Split into training and test sets X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42) # Train the model reg = LinearRegression() reg.fit(X_train, y_train) # Make predictions y_pred = reg.predict(X_test) # Evaluate the model r2 = r2_score(y_test, y_pred) print(f"R² score: {r2:.3f}")

Possible values of are:

  • 1.0: Perfect prediction
  • 0: No improvement over predicting the mean
  • Negative: Worse than predicting the mean

Key Takeaways

  • Use classification metrics for categorical outputs and regression metrics for continuous outputs.
  • The most common regression metrics are Mean Squared Error (MSE) and Coefficient of Determination (R²).
  • Lower MSE and higher values generally indicate better performance.
Quiz
0 / 1

What is the primary advantage of using a confusion matrix in evaluating a classification model?

It provides the accuracy of the model.

It predicts the future performance of the model.

It reveals where the model is making mistakes for each class.

It generates new datasets for training.

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help