In the world of machine learning, building a model is just the start—deploying it locally for quick predictions is where the real value lies. Today, I’ll walk you through creating a basic email spam detection model using scikit-learn, training it on a real dataset from Hugging Face, saving it, and running it on your local machine. This is perfect for beginners looking to experiment with text classification without needing cloud resources.
We’ll use a Naive Bayes classifier with TF-IDF vectorization for spam vs. ham (non-spam) detection. No advanced setups like Ollama are needed here—we’ll keep it simple with pure Python.
Step 1: Training and Saving the Model
Start by training the model in a Python environment (like Google Colab or a local Jupyter notebook). We’ll load the Enron Spam dataset from Hugging Face, which contains thousands of labeled emails.
Here’s the colab notebook code to train and save the model: https://colab.research.google.com/drive/1-mVCKGFKDn_-Gt29zg7pO2y-JQnOLEuG?usp=sharing
This notebook:
- Loads a dataset with real emails.
- Trains a pipeline that vectorizes text and classifies it.
- Saves the model as a
.pklfile for easy reuse.
Download harispam_model.pkl if you’re in Colab.
Step 2: Using the Saved Model Locally
Now, on your local machine (with Python installed), load the model and classify new emails. Install dependencies with
Building a Simple Email Spam Detection Model: Train, Save, and Use Locally in Flask Python
Once you saved your model you can clone the GitHub repo:https://github.com/HariharanPalanisamyUAE/EmailSpamAIModule
Follow the steps
- Create virtual env python3 or python – python -m venv env
- Install required libraries using – python -r requirements.txt
- set flask app
- Run flask application – flask run
Why This Matters
This approach demonstrates how easy it is to go from data to a deployable model. With an accuracy often above 95% on the Enron dataset, it’s a solid starting point for spam filtering. Experiment by tweaking the dataset or model for better results!
If you’re into ML, try this out and share your tweaks in the comments. What’s your go-to dataset for text classification?
#MachineLearning #SpamDetection #Python #AI