Lecture

Categorical Data Encoding

AI and machine learning models can only understand numbers.

However, much of the data we work with is text-based.

This kind of data, grouped into categories without numerical meaning, is called categorical data.`

Example of Categorical Data
| ID | Color | Region | Occupation | |-----|-------|--------|------------| | 1 | Red | New York | Student | | 2 | Blue | Chicago | Employee | | 3 | Green | Los Angeles | Student | | 4 | Yellow| New York | Doctor |

In the data above, color, region, and occupation are categorical data.

These cannot be used for direct calculations, and comparing their magnitude or order is not meaningful.

Categorical data can be divided into two main types.


Nominal Data

This is categorical data without any order. Examples of nominal data include colors (red, blue, green) and regions (New York, Chicago, Los Angeles).

Ordinal Data

This is categorical data with an order. Examples of ordinal data include education levels (elementary, middle, high school) and customer satisfaction levels (low, medium, high).

Categorical data needs to be converted into numerical form for machine learning, a process known as encoding.


What is Data Encoding?

Categorical data must be transformed into numbers so that machine learning models can comprehend it. This transformation process is known as data encoding.

For example, let's convert the color data above into numbers.

Color Data Encoding
| ID | Color | Color (Encoded) | |-----|--------|----------------| | 1 | Red | 0 | | 2 | Blue | 1 | | 3 | Green | 2 | | 4 | Yellow | 3 |

This allows the model to process color data numerically.

There are methods like Label Encoding and One-Hot Encoding for this transformation.

We will discuss each method in more detail in the following lessons.

Quiz
0 / 1

What is the process of converting categorical data into numbers called?

Standardization

Normalization

Encoding

Clustering

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help