Understanding the spam scoring system in detail
Every email processed by Cleanbox receives a spam score calculated by Rspamd, an advanced spam scanning engine. This article explains exactly how scores are built, what each component means, and how the system improves over time.
Score components
The spam score is a sum of individual symbols. Each symbol represents a specific check that either passed or failed. Positive scores indicate spam signals. Negative scores indicate legitimate signals.
Authentication symbols
| Symbol | Score | Meaning |
|---|---|---|
| SPF_PASS | Negative | SPF authentication passed |
| SPF_FAIL | Positive | SPF authentication failed |
| DKIM_PASS | Negative | DKIM signature valid |
| DKIM_FAIL | Positive | DKIM signature invalid or missing |
| DMARC_PASS | Negative | DMARC alignment passed |
| DMARC_FAIL | Positive | DMARC alignment failed |
Content symbols
| Symbol | Meaning |
|---|---|
| BAYES_SPAM | Bayesian classifier identified spam patterns in content |
| BAYES_HAM | Bayesian classifier identified legitimate patterns |
| URL_PHISHING | Email contains links to known phishing domains |
| URL_SHORTENED | Email contains shortened URLs (bit.ly, etc.) |
Cleanbox custom symbols
These are unique to Cleanbox, injected based on crowd-sourced sender reputation data:
| Symbol | Score impact | Trigger |
|---|---|---|
| CLEANBOX_BLOCK | +8.0 | 10+ users reported sender, 90%+ spam ratio |
| CLEANBOX_QUARANTINE | +5.0 | 5+ users reported sender, 70%+ spam ratio |
| CLEANBOX_GREYLIST | +2.0 | 3+ users reported sender, more spam than ham |
| CLEANBOX_TRUSTED | -2.0 | 5+ teams have this sender whitelisted or prioritized |
| CLEANBOX_UNCOMMON | +1.0 | Sender has never emailed any Cleanbox user before |
| CLEANBOX_BULK_ESP | +0.5 | Sent via a known bulk email service provider |
How reputation is built
Cleanbox reputation symbols are powered by aggregated user feedback:
- A user receives a message and marks it as spam (thumbs down) or not spam (thumbs up)
- The feedback is recorded per sender address and per sender domain
- Aggregation happens at both levels:
- Address level: feedback specific to
newsletter@spamshop.com - Domain level: feedback aggregated across all addresses at
@spamshop.com(skipped for freemail domains like gmail.com)
- Address level: feedback specific to
- Based on the ratio of spam vs. ham reports and the number of unique users reporting, a recommendation is generated
- The recommendation is passed to Rspamd as a custom header, which triggers the corresponding CLEANBOX_* symbol
Bayesian learning
When a user provides feedback, the raw email is also sent to Rspamd for Bayes training:
- Thumbs down (spam) → Rspamd learns this message pattern as spam
- Thumbs up (not spam) → Rspamd learns this message pattern as legitimate
This training is permanent and cumulative. The more feedback users provide, the more accurate the Bayes classifier becomes. Over time, the classifier builds a statistical model of what spam looks like in the Cleanbox ecosystem.
Reading a spam report
Every message in Cleanbox includes a spam report accessible from the message detail page. The report shows:
- Total score — The sum of all symbol scores
- Symbol list — Every rule that triggered, with its individual score
- Authentication results — SPF, DKIM, DMARC pass/fail
Example report
Total score: 7.3
Symbols:
SPF_FAIL +2.0
DKIM_FAIL +1.5
BAYES_SPAM +3.0
CLEANBOX_UNCOMMON +1.0
URL_SHORTENED +0.8
MIME_HTML_ONLY -0.5
HAS_LIST_UNSUB -0.5
This message scored 7.3 because: SPF and DKIM both failed (+3.5), the content matched spam patterns (+3.0), it was from an unknown sender (+1.0), and it contained shortened URLs (+0.8). The negative scores came from having HTML content and a List-Unsubscribe header (signs of legitimate bulk mail).
How thresholds use the score
Each address has two configurable thresholds:
- Quarantine threshold — Score at which messages are held for review
- Reject threshold — Score at which messages are rejected outright
See Choosing the Right Spam Threshold for recommendations per use case.
Relay-specific scoring
Relay addresses receive two additional checks that aliases do not:
- DNSBL (IP blacklists) — The sender server IP is checked against Spamhaus, Barracuda, and SpamCop. A blacklisted IP results in immediate rejection before content is analyzed.
- Symbol rules — You can configure specific Rspamd symbols that trigger automatic rejection on relay addresses (e.g., always reject if PHISHING_URL is present).
These additional checks reflect the higher security requirements of relay-protected addresses, which typically handle business-critical email.