Understanding the spam scoring system in detail

Every email processed by Cleanbox receives a spam score calculated by Rspamd, an advanced spam scanning engine. This article explains exactly how scores are built, what each component means, and how the system improves over time.

Score components

The spam score is a sum of individual symbols. Each symbol represents a specific check that either passed or failed. Positive scores indicate spam signals. Negative scores indicate legitimate signals.

Authentication symbols

Symbol	Score	Meaning
SPF_PASS	Negative	SPF authentication passed
SPF_FAIL	Positive	SPF authentication failed
DKIM_PASS	Negative	DKIM signature valid
DKIM_FAIL	Positive	DKIM signature invalid or missing
DMARC_PASS	Negative	DMARC alignment passed
DMARC_FAIL	Positive	DMARC alignment failed

Content symbols

Symbol	Meaning
BAYES_SPAM	Bayesian classifier identified spam patterns in content
BAYES_HAM	Bayesian classifier identified legitimate patterns
URL_PHISHING	Email contains links to known phishing domains
URL_SHORTENED	Email contains shortened URLs (bit.ly, etc.)

Cleanbox custom symbols

These are unique to Cleanbox, injected based on crowd-sourced sender reputation data:

Symbol	Score impact	Trigger
CLEANBOX_BLOCK	+8.0	10+ users reported sender, 90%+ spam ratio
CLEANBOX_QUARANTINE	+5.0	5+ users reported sender, 70%+ spam ratio
CLEANBOX_GREYLIST	+2.0	3+ users reported sender, more spam than ham
CLEANBOX_TRUSTED	-2.0	5+ teams have this sender whitelisted or prioritized
CLEANBOX_UNCOMMON	+1.0	Sender has never emailed any Cleanbox user before
CLEANBOX_BULK_ESP	+0.5	Sent via a known bulk email service provider

How reputation is built

Cleanbox reputation symbols are powered by aggregated user feedback:

A user receives a message and marks it as spam (thumbs down) or not spam (thumbs up)
The feedback is recorded per sender address and per sender domain
Aggregation happens at both levels:
- Address level: feedback specific to newsletter@spamshop.com
- Domain level: feedback aggregated across all addresses at @spamshop.com (skipped for freemail domains like gmail.com)
Based on the ratio of spam vs. ham reports and the number of unique users reporting, a recommendation is generated
The recommendation is passed to Rspamd as a custom header, which triggers the corresponding CLEANBOX_* symbol

Bayesian learning

When a user provides feedback, the raw email is also sent to Rspamd for Bayes training:

Thumbs down (spam) → Rspamd learns this message pattern as spam
Thumbs up (not spam) → Rspamd learns this message pattern as legitimate

This training is permanent and cumulative. The more feedback users provide, the more accurate the Bayes classifier becomes. Over time, the classifier builds a statistical model of what spam looks like in the Cleanbox ecosystem.

Reading a spam report

Every message in Cleanbox includes a spam report accessible from the message detail page. The report shows:

Total score — The sum of all symbol scores
Symbol list — Every rule that triggered, with its individual score
Authentication results — SPF, DKIM, DMARC pass/fail

Example report

Total score: 7.3

Symbols:
  SPF_FAIL            +2.0
  DKIM_FAIL           +1.5
  BAYES_SPAM          +3.0
  CLEANBOX_UNCOMMON   +1.0
  URL_SHORTENED       +0.8
  MIME_HTML_ONLY      -0.5
  HAS_LIST_UNSUB      -0.5

This message scored 7.3 because: SPF and DKIM both failed (+3.5), the content matched spam patterns (+3.0), it was from an unknown sender (+1.0), and it contained shortened URLs (+0.8). The negative scores came from having HTML content and a List-Unsubscribe header (signs of legitimate bulk mail).

How thresholds use the score

Each address has two configurable thresholds:

Quarantine threshold — Score at which messages are held for review
Reject threshold — Score at which messages are rejected outright

See Choosing the Right Spam Threshold for recommendations per use case.

Relay-specific scoring

Relay addresses receive two additional checks that aliases do not:

DNSBL (IP blacklists) — The sender server IP is checked against Spamhaus, Barracuda, and SpamCop. A blacklisted IP results in immediate rejection before content is analyzed.
Symbol rules — You can configure specific Rspamd symbols that trigger automatic rejection on relay addresses (e.g., always reject if PHISHING_URL is present).

These additional checks reflect the higher security requirements of relay-protected addresses, which typically handle business-critical email.