Building a Machine Learning Model for Spam Email Classification
In this lesson, we’ll create a machine learning model that classifies an email as spam or normal
based on its content.
Run each code cell by pressing Shift + Enter
to train your machine learning model and predict the spam status of new emails.
The process of creating the machine learning model consists of the following steps:
1. Preparing the Data
We'll start by loading a simple dataset of 10 emails in CSV format as strings.
CSV
stands for Comma-Separated Values, a file format where data is stored separated by commas.
CSV files are commonly used in data analysis and machine learning because they store tabular data in a simple, readable format that can be opened in tools such as Excel.
email_text,label "Free money now!!! Click here",spam "Limited offer just for you",spam "Meeting schedule attached",normal "Please review the project proposal",normal "Win a free iPhone by answering this",spam "Lunch meeting confirmed for tomorrow",normal "Urgent: Your account has been suspended",spam
Each email is represented in the email_text
column, and its classification — either spam
or normal
— is shown in the label
column.
2. Text Vectorization (TF-IDF)
Machine learning models cannot directly understand text data, so we need a process to convert sentences into numbers.
Here, we use TF-IDF
vectorization. This method transforms the text into numerical features by evaluating the importance of each word, based on how often it appears in a document and how rare it is across the dataset.
3. Model Training and Evaluation
Now, we will train our machine learning model with the vectorized data.
We will use the Multinomial Naive Bayes
algorithm, which is well-suited for text classification tasks.
4. Predictions for New Emails
Finally, we will use the trained model to predict whether a new email is spam or not.
With just simple data and code, machine learning models can be used to automatically classify text
. In the next lesson, we will learn about essential Python libraries necessary for studying machine learning and deep learning.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help