How We Built AI-Powered Spam Detection That Understands What Emails Actually Say

18 min read Published: Nov 18, 2025 Last modified: Jun 5, 2026

Spam filters have worked the same way for 20 years. They count words, check sender reputation, verify authentication, and assign a score. If the score exceeds a threshold, the email is blocked. This approach catches 95% of spam.

The other 5% is the problem.

A fake PayPal email from paypal-notifications-center.com with valid SPF, proper DKIM, and HTML that looks exactly like a real PayPal notification scores 0.1 on traditional filters. It is delivered. The user clicks the link. Credentials stolen.

We built something different. Cleanbox now includes an AI content classifier that reads every email, understands what it is actually saying, and makes a judgment call — just like a human would. And it explains its reasoning in plain English.

The problem with statistical spam filters

Rspamd (the engine Cleanbox uses) is one of the best open-source spam scanners available. Its Bayesian classifier learns from millions of emails. It checks SPF, DKIM, DMARC, URL reputation, content patterns, and hundreds of other signals.

But Bayesian classification is statistical, not contextual. It recognizes patterns it has seen before. It does not understand meaning.

Email	Rspamd score	Why it failed
Fake PayPal — "Unauthorized transaction EUR 847.50"	0.1	Content resembles real PayPal emails (same words, same structure)
AT&T voicemail phishing — "New voicemail message"	1.1	Short, clean HTML, legitimate-looking sender name
Fake Apple ID locked	2.9	Below quarantine threshold on most configs
Sextortion demanding Bitcoin	varies	Unique language patterns not in the training corpus
Fake debt collection from `bintopia.com` pretending to be a real collection agency	low	Bayes has no data on this domain impersonating that brand

In each case, a human would immediately recognize the email as malicious. The domain does not match the brand. The urgency is artificial. The Bitcoin demand is a scam. But the statistical filter does not understand any of that.

How our AI classifier works

We built a custom AI content analysis engine that runs on every incoming email, inline with the SMTP transaction. It is not a bolt-on that runs after delivery — it runs during the spam scanning phase, before the email is accepted or rejected.

The pipeline

Email arrives at the Cleanbox SMTP server
Content extraction: The sender address, display name, subject, and body are extracted. HTML-only emails are converted to clean plain text (style/script blocks removed, entities decoded, whitespace normalized). Body is capped at a safe length.
Cache check: A hash of the subject + body is checked against a classification cache. If this exact content was already analyzed (common with bulk spam), the cached result is returned instantly — no duplicate processing.
AI analysis: If not cached, the email content is analyzed by our AI classification engine using a custom-built prompt that encodes our spam detection rules, brand impersonation patterns, and confidence scoring calibration.
Structured response: The AI returns a structured verdict: spam or ham, a confidence score (0.0 to 1.0), and a human-readable reason explaining why.
Score integration: The AI verdict is passed to Rspamd alongside all normal checks. A custom scoring module combines the AI verdict with Rspamd's Bayesian score to produce a variable-weight symbol (CLEANBOX_AI_SPAM or CLEANBOX_AI_HAM).
Header injection: The AI's reasoning is injected into the delivered email as an X-Cleanbox-Explanation header, visible to the recipient.

Variable scoring — AI and Bayes work together

The AI does not override Rspamd. It works with it. The AI symbol score is variable, based on two factors:

AI confidence: How certain the AI is about its verdict (0.0 to 1.0)
Bayes agreement: Whether Rspamd's own classifier agrees or disagrees

When both agree, the score is high (strong combined signal). When they disagree, the score is lower (cautious — one of them might be wrong). This prevents the AI from bulldozing legitimate email that Bayes knows is safe.

Scenario	AI says	Bayes says	AI score	Logic
Sextortion email	spam (0.92)	spam	+3.29	Both agree → high confidence
Phishing (Bayes missed it)	spam (0.95)	ham	+2.90	AI disagrees with Bayes → cautious but still catches it
Legitimate newsletter	ham (0.92)	ham	-4.05	Both agree it is safe → strong negative score (protects from false positives)
Cold outreach	spam (0.85)	neutral	+2.98	AI is less confident, Bayes has no opinion → moderate score

What the AI actually understands

This is not keyword matching. The AI understands context:

Domain impersonation: It knows that paypal-notifications-center.com is not paypal.com. It knows that id-apple-support.com is not apple.com. It knows that chase-secure-login.net is not chase.com. A statistical filter sees "PayPal" in the content and thinks it is legitimate.
Social engineering patterns: Urgency ("24 hours"), authority ("your account"), consequence ("permanent disable"), and action ("verify now") are recognized as manipulation tactics, not just words.
Scam structures: Sextortion follows a recognizable structure: claim of compromise → threat of exposure → Bitcoin demand → deadline. The AI recognizes this structure regardless of the specific words used.
Language and intent: "I noticed your website could use better SEO" from an unknown sender is cold outreach. The same words from a known contact in a reply thread are legitimate advice. Context matters.

The explanation header

Every email processed by the AI classifier gets an X-Cleanbox-Explanation header injected into the delivered message. This is the AI's reasoning in plain English:

X-Cleanbox-Explanation: sender domain paypal-notifications-center.com is not paypal.com, urgency + fake dispute link

X-Cleanbox-Explanation: classic sextortion scam with threats of webcam recordings, demands for Bitcoin payment, fake malware claims, and password extortion tactics

X-Cleanbox-Explanation: legitimate newsletter from official company domain careers.microsoft.com with job vacancy listings and subscription management links

This header is visible in email clients that show full headers, and in the Cleanbox message detail page. It gives you instant insight into why the AI classified an email the way it did — no guessing, no looking up cryptic rule names.

Caching: avoiding duplicate work

Bulk spam campaigns send identical content from hundreds of different sender addresses. Without caching, each copy would trigger a separate analysis.

Our caching system uses a hash of the subject line and body content (intentionally excluding the sender address, since bulk spam uses different senders for identical content). The first email in a campaign triggers the analysis. Every subsequent copy gets an instant cache hit.

Real example: the same home battery scam email arrived 6 times from different addresses. Only 1 analysis was performed. 5 cache hits.

What the user sees

The AI classification appears in the spam report alongside all other Rspamd symbols:

CLEANBOX_AI_SPAM    +3.29    AI content classifier detected spam/phishing patterns
BAYES_SPAM          +0.25    Message probably spam
HFILTER_HOSTNAME    +1.00    Unknown client hostname
RDNS_NONE           +0.50    No reverse DNS
---
Total:              +5.04

Without the AI, this email would have scored 1.75 — well below most spam thresholds. With the AI, it scores 5.04 — caught and quarantined or rejected depending on your threshold.

Detection results

We tested the AI classifier against 28 real-world emails spanning 5 categories:

Category	Emails tested	Correct	AI confidence
Brand impersonation phishing	9	9/9	0.92-0.99
Sextortion and extortion	5	5/5	0.99-1.0
Financial scams (crypto, lottery, fake invoices)	4	4/4	0.92-0.99
Cold outreach (SEO, dev agencies)	2	2/2	0.85
Legitimate emails (PayPal, FedEx, GitHub, Stripe)	8	8/8	0.92-0.95

100% accuracy across all categories with zero false positives on legitimate email. The AI correctly identified every brand impersonation, every scam pattern, and every legitimate notification — including ones that Rspamd's Bayes classifier missed entirely.

Performance

AI analysis adds processing time to the SMTP transaction. We measured the impact:

AI analysis adds a small amount of processing time to the SMTP transaction, typically keeping the total well under a second. The caching system described above ensures that duplicate content is never analyzed twice, keeping overhead minimal even during bulk spam campaigns.

The architecture

The AI integration follows the same pattern as our existing crowd-sourced reputation system:

The SMTP server determines the AI verdict (via the classification engine)
The verdict is passed as HTTP headers to Rspamd
A Rspamd Lua postfilter reads the headers and inserts a scoring symbol
Rspamd returns the complete spam report including the AI symbol

This keeps all scoring within Rspamd, maintains consistency with existing CLEANBOX_* symbols, and the AI symbol appears in the dashboard just like any other rule. No special UI code was needed.

What this means for you

If you use Cleanbox, AI spam detection is active on your account. You do not need to configure anything. It runs automatically alongside all existing spam detection.

In your spam reports, look for CLEANBOX_AI_SPAM or CLEANBOX_AI_HAM symbols. In delivered emails, check the X-Cleanbox-Explanation header for the AI's reasoning.

You can also use the AI symbols in filter rules: create a filter matching CLEANBOX_AI_SPAM with a deny action for zero-tolerance AI-detected spam blocking.

Traditional spam filters count words. Ours reads the email and understands what it says. That is the difference between catching 95% of spam and catching 99%+.