Label Encoding and One-Hot Encoding
In this lesson, we will learn about Label Encoding
and One-Hot Encoding
, methods used to convert categorical data into numerical form during preprocessing.
1. Label Encoding
This method converts each category into numbers.
| Student Name | Favorite Subject | Label Encoding Value | |--------------|------------------|----------------------| | John | Math | 0 | | Emily | English | 1 | | Sarah | Science | 2 | | Mike | Math | 0 |
Label encoding is simple and efficient as it converts data into numbers straightforwardly.
However, it may incorrectly imply that the numerical order reflects actual importance or ranking.
For example, the above data might suggest that Math(0) < English(1) < Science(2)
reflects an order of importance.
Training an AI model on such encodings can lead to incorrect predictions.
2. One-Hot Encoding
This method creates a new column for each category and places a 1
in the column corresponding to the category.
| Student Name | Favorite Subject | Math | English | Science | |--------------|------------------|------|---------|---------| | John | Math | 1 | 0 | 0 | | Emily | English | 0 | 1 | 0 | | Sarah | Science | 0 | 0 | 1 | | Mike | Math | 1 | 0 | 0 |
One-hot encoding prevents misinterpretation of numerical order by using only 0
s and 1
s.
However, it can lead to large datasets as many new columns may be created.
Which Should You Use?
✔ Label Encoding: Not recommended for unordered data like subjects, as it can mislead due to number order.
✔ One-Hot Encoding: More suitable for unordered data, but can be inefficient if there are too many categories.
👉 In general, if order doesn’t matter, one-hot encoding is the preferred method.
In the next lesson, we will tackle a simple quiz to review what we have learned so far.
Label encoding is preferable when the order of the data does not matter.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help