Building a Simple Email Spam Detection Model: Train, Save, and Use Locally in Flask Python

In the world of machine learning, building a model is just the start—deploying it locally for quick predictions is where the real value lies. Today, I’ll walk you through creating a basic email spam detection model using scikit-learn, training it on a real dataset from Hugging Face, saving it, and running it on your local machine. This is perfect for beginners looking to experiment with text classification without needing cloud resources.

We’ll use a Naive Bayes classifier with TF-IDF vectorization for spam vs. ham (non-spam) detection. No advanced setups like Ollama are needed here—we’ll keep it simple with pure Python.

Step 1: Training and Saving the Model

Start by training the model in a Python environment (like Google Colab or a local Jupyter notebook). We’ll load the Enron Spam dataset from Hugging Face, which contains thousands of labeled emails.

Here’s the colab notebook code to train and save the model: https://colab.research.google.com/drive/1-mVCKGFKDn_-Gt29zg7pO2y-JQnOLEuG?usp=sharing

This notebook:

Loads a dataset with real emails.
Trains a pipeline that vectorizes text and classifies it.
Saves the model as a .pkl file for easy reuse.

Download harispam_model.pkl if you’re in Colab.

Step 2: Using the Saved Model Locally

Now, on your local machine (with Python installed), load the model and classify new emails. Install dependencies with