Lecture

Building a Machine Learning Model for Spam Email Classification

In this lesson, we’ll create a machine learning model that classifies an email as spam or normal based on its content.

Run each code cell by pressing Shift + Enter to train your machine learning model and predict the spam status of new emails.

The process of creating the machine learning model consists of the following steps:


1. Preparing the Data

We'll start by loading a simple dataset of 10 emails in CSV format as strings.

CSV stands for Comma-Separated Values, a file format where data is stored separated by commas.


CSV files are commonly used in data analysis and machine learning because they store tabular data in a simple, readable format that can be opened in tools such as Excel.

Example of a CSV file
email_text,label "Free money now!!! Click here",spam "Limited offer just for you",spam "Meeting schedule attached",normal "Please review the project proposal",normal "Win a free iPhone by answering this",spam "Lunch meeting confirmed for tomorrow",normal "Urgent: Your account has been suspended",spam

Each email is represented in the email_text column, and its classification — either spam or normal — is shown in the label column.


2. Text Vectorization (TF-IDF)

Machine learning models cannot directly understand text data, so we need a process to convert sentences into numbers.

Here, we use TF-IDF vectorization. This method transforms the text into numerical features by evaluating the importance of each word, based on how often it appears in a document and how rare it is across the dataset.


3. Model Training and Evaluation

Now, we will train our machine learning model with the vectorized data.

We will use the Multinomial Naive Bayes algorithm, which is well-suited for text classification tasks.


4. Predictions for New Emails

Finally, we will use the trained model to predict whether a new email is spam or not.


With just simple data and code, machine learning models can be used to automatically classify text. In the next lesson, we will learn about essential Python libraries necessary for studying machine learning and deep learning.

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help