Lecture

What is Cross-Validation?

Cross-validation is a model evaluation technique that tests how well a model generalizes to unseen data. Instead of using a single train-test split, cross-validation divides the dataset into multiple folds, training and testing the model several times on different subsets.

In k-fold cross-validation:

  1. The data is divided into k folds.
  2. For each fold:
    • Train the model on k-1 folds.
    • Test it on the remaining fold.
  3. Average the results to get a more reliable performance estimate.

Common Cross-Validation Types

  • K-Fold Cross-Validation: Most common, splits into k equal folds.
  • Stratified K-Fold: Maintains class proportions in each fold (important for classification).
  • Leave-One-Out (LOO): Each observation is tested individually.
  • ShuffleSplit: Random splits with replacement.

Example: Comparing Models with Cross-Validation

In this example, both models are evaluated using 5-fold cross-validation, and the one with the higher average accuracy is considered better.

Cross-Validation Example
from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier # Load dataset iris = load_iris() X, y = iris.data, iris.target # Define models log_reg = LogisticRegression(max_iter=200) knn = KNeighborsClassifier(n_neighbors=5) # Cross-validation log_scores = cross_val_score(log_reg, X, y, cv=5) knn_scores = cross_val_score(knn, X, y, cv=5) print(f"Logistic Regression mean score: {log_scores.mean():.3f}") print(f"KNN mean score: {knn_scores.mean():.3f}")

This example uses 5-fold cross-validation to compare two models and select the one with the highest average accuracy.


Key Takeaways

  • Model selection ensures the chosen model is the best fit for both accuracy and efficiency.
  • Cross-validation gives a more robust estimate of real-world performance.
  • Always use the same cross-validation strategy when comparing models to ensure fairness.
Quiz
0 / 1

What is the primary purpose of using cross-validation in model selection?

Cross-validation helps in the model's performance by splitting the dataset into multiple subsets.
training
testing
evaluating
simplifying

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help