technology spam how-to

Spam Filters Explained: How Email Providers Decide What Is Spam

9 min read Jan 6, 2025

Every day, over 160 billion spam emails are sent worldwide. That is roughly 85% of all email traffic. Yet most of us only see a handful of spam per week — the rest is silently filtered out before it reaches our inbox.

How does that filtering work? What happens inside a spam filter? And why does it sometimes flag legitimate emails as spam?

The layers of spam detection

Modern spam filters are not a single check. They are a stack of independent detection systems, each contributing a score. The scores are combined, and if the total exceeds a threshold, the message is flagged as spam.

Layer 1: Connection-level checks

Before the email content is even transmitted, the receiving server evaluates the connection itself:

IP reputation — The sender server IP is checked against DNS-based blackhole lists (DNSBL). Services like Spamhaus, Barracuda, and SpamCop maintain databases of known spam-sending IPs. A blacklisted IP may be rejected outright.
Reverse DNS — Does the sending IP have a valid reverse DNS record? Legitimate mail servers almost always do. Spam servers often do not.
Rate limiting — Is this server sending an unusually high volume of email? Sudden spikes suggest a compromised server being used for spam.

Layer 2: Authentication

Three standards verify the sender identity (see our complete authentication guide):

SPF — Is the sending server authorized to send email for this domain?
DKIM — Does the email have a valid cryptographic signature proving it was not tampered with?
DMARC — Do SPF and DKIM results align with the From domain?

Failed authentication does not always mean spam — some legitimate senders misconfigure their DNS. But it significantly increases the spam score.

Layer 3: Content analysis

This is the core of spam detection. The email headers and body are analyzed using multiple techniques:

Bayesian classification

The most important technique. A Bayesian classifier is a machine learning model trained on millions of emails labeled as spam or legitimate (ham). It learns which words, phrases, and patterns are associated with each category.

For example, the classifier learns that emails containing "act now", "limited time offer", and "click here to claim" are statistically more likely to be spam. But it also learns context — "limited time offer" in a promotional email from Amazon is different from the same phrase in a Nigerian prince scam.

The model is not static. It improves continuously as users provide feedback (marking messages as spam or not-spam). This is why your spam filter gets better over time.

Header analysis

Missing or forged headers (Date, Message-ID, From)
Suspicious sending software signatures (X-Mailer header)
Mismatched From and Reply-To addresses
Encoded headers designed to bypass filters

URL analysis

Links pointing to known phishing domains
URL shorteners (often used to hide malicious destinations)
Domains registered very recently (common for disposable spam infrastructure)
Homograph attacks (using lookalike characters: paypa1.com vs paypal.com)

Structural analysis

Image-to-text ratio (spam often uses images to avoid text-based detection)
HTML complexity (overly nested tables, hidden text, CSS tricks)
Attachment types (executable files, password-protected archives)

Layer 4: Reputation systems

Beyond individual email analysis, reputation systems track sender behavior over time:

Sender reputation — How have other recipients reacted to emails from this sender? If thousands of people mark emails from @sketchy-deals.com as spam, future emails from that domain start with a high spam score.
Domain age — Newly registered domains are more suspicious. Legitimate businesses rarely send email from domains created yesterday.
Volume patterns — A sudden spike in email volume from a previously quiet sender suggests compromise or spam campaign.

How scores work

Each layer produces individual scores (called symbols in systems like Rspamd). These are summed into a total spam score:

Symbol	Score	Meaning
DKIM_PASS	-0.5	DKIM signature is valid (good sign)
SPF_FAIL	+2.0	Sending server not authorized (bad sign)
BAYES_SPAM	+5.0	Bayesian classifier says spam
URL_PHISHING	+7.0	Contains known phishing URL
SENDER_TRUSTED	-2.0	Many users trust this sender

A message with DKIM_PASS (-0.5) + SPF_FAIL (+2.0) + BAYES_HAM (-3.0) = -1.5 → legitimate.

A message with SPF_FAIL (+2.0) + BAYES_SPAM (+5.0) + URL_PHISHING (+7.0) = +14.0 → definite spam.

Why spam filters get it wrong

False positives (legitimate email marked as spam)

New sender — No reputation data, default score is slightly elevated
Poor authentication — The sender's IT team misconfigured SPF or DKIM
Marketing language — Legitimate promotions use the same phrases as spam
Shared hosting — Your IP is on a server with other senders who send spam

False negatives (spam that gets through)

New spam campaign — The classifier has not seen this pattern before
Compromised legitimate account — Spam from a trusted sender domain
Image-only spam — No text for the classifier to analyze
Snowshoe spam — Low volume from many different IPs to avoid blacklists

What you can do

As a recipient

Always provide feedback. Mark spam as spam, and mark false positives as not-spam. This trains the Bayesian classifier.
Whitelist trusted senders. Do not rely on the filter to get it right for important contacts — guarantee delivery by whitelisting.
Adjust your threshold. If your filter is too aggressive (many false positives), raise the spam threshold. If too much spam gets through, lower it.
Use a filter with transparency. Choose a provider that shows you the full spam report — every symbol, every score. Understanding why helps you tune the filter.

As a sender

Set up SPF, DKIM, and DMARC. This is table stakes in 2026. Without it, your emails are penalized by every filter.
Warm up new domains. Do not send 10,000 emails from a brand new domain on day one. Start low and build reputation gradually.
Honor unsubscribe requests. Include RFC 8058 List-Unsubscribe headers. Getting marked as spam is far worse for your reputation than losing a subscriber.
Monitor your sender reputation. Use Google Postmaster Tools, Microsoft SNDS, and blacklist monitoring to catch problems early.

The arms race continues

Spam filtering is an arms race. Spammers evolve their techniques, filters adapt, spammers evolve again. The good news is that the defense is winning — the combination of authentication, machine learning, and crowd-sourced reputation catches the vast majority of spam before you ever see it.

The bad news is that it will never be perfect. The goal is not zero spam — it is making spam so rare and so powerless that it no longer affects your day.