Lecture

Label Encoding and One-Hot Encoding

In this lesson, we will learn about Label Encoding and One-Hot Encoding, methods used to convert categorical data into numerical form during preprocessing.


1. Label Encoding

This method converts each category into numbers.

Label Encoding Example
| Student Name | Favorite Subject | Label Encoding Value | |--------------|------------------|----------------------| | John | Math | 0 | | Emily | English | 1 | | Sarah | Science | 2 | | Mike | Math | 0 |

Label encoding is simple and efficient as it converts data into numbers straightforwardly.

However, it may incorrectly imply that the numerical order reflects actual importance or ranking.

For example, the above data might suggest that Math(0) < English(1) < Science(2) reflects an order of importance.

Training an AI model on such encodings can lead to incorrect predictions.


2. One-Hot Encoding

This method creates a new column for each category and places a 1 in the column corresponding to the category.

One-Hot Encoding Example
| Student Name | Favorite Subject | Math | English | Science | |--------------|------------------|------|---------|---------| | John | Math | 1 | 0 | 0 | | Emily | English | 0 | 1 | 0 | | Sarah | Science | 0 | 0 | 1 | | Mike | Math | 1 | 0 | 0 |

One-hot encoding prevents misinterpretation of numerical order by using only 0s and 1s.

However, it can lead to large datasets as many new columns may be created.


Which Should You Use?

✔ Label Encoding: Not recommended for unordered data like subjects, as it can mislead due to number order.

✔ One-Hot Encoding: More suitable for unordered data, but can be inefficient if there are too many categories.


👉 In general, if order doesn’t matter, one-hot encoding is the preferred method.

In the next lesson, we will tackle a simple quiz to review what we have learned so far.

Quiz
0 / 1

Label encoding is preferable when the order of the data does not matter.

True
False

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help