Predicting Data with a Line - Linear Regression
Linear Regression
is a method used to create a line (or a plane in higher dimensions) from data to learn patterns. By inputting new data, it can predict corresponding numerical values.
The simplest form, Simple Linear Regression
, can be expressed with the following formula:
The meaning of each variable is as follows:
-
: Input data (e.g., study hours)
-
: The value to be predicted (e.g., test score)
-
: Slope (Weight), determines how the result changes as the input value increases
-
: Intercept (Bias), the point where the graph meets the Y-axis
Using this equation, you can predict the value given a specific value.
For instance, if a student scored 40 points after 1 hour of study and 60 points after 2 hours, with a base score of 10, the $W$
and $B$
values are calculated as follows:
B = 10 W = (60 - 40) / (2 - 1) = 20
Based on this information, the linear regression model is learned as Y = 20X + 10
.
For 3 hours of study, the test score is calculated as follows:
Y = 20 * 3 + 10 = 70
According to the regression model, this student is predicted to score 70 points with 3 hours of study.
While this example uses only 2 data points for simplicity, in reality, linear regression models are trained using extensive datasets.
How Linear Regression Works
Linear regression learns by finding the optimal line in the given data.
To achieve this, it needs to find the optimal W and B values
that minimize the loss function.
The most commonly used loss function is MSE (Mean Squared Error)
, which outlines the average squared difference between the model's predictions and actual values. Smaller values indicate better learning by the model.
Machine learning models use an algorithm known as Gradient Descent
to reduce loss and find the best W and B.
Limitations of Linear Regression
Linear regression is a simple and easy-to-interpret algorithm, but it has several limitations.
1. Can Only Learn Linear Relationships
If the data does not follow a linear relationship, the predictive performance of a linear regression model can diminish.
For example, in cases with U-shaped or S-shaped data patterns, linear regression is not suitable.
2. Sensitive to Outliers
If there are extreme data points (outliers), the model can be significantly influenced.
3. Limited Predictive Power with Insufficient Features
In many real-world situations, it's difficult to determine outcomes (Y) using just one variable (X).
In such cases, Multiple Linear Regression
can be used to incorporate multiple input variables.
Linear regression is one of the fundamental algorithms in machine learning that analyzes data in a linear form to predict numerical values.
In the next lesson, we will explore Logistic Regression
.
Linear regression is a method of finding the line that best explains the pattern in the data.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help