Cleanbox
Features Helpdesk Blog Pricing Contact
Sign in Start free trial

Understanding the spam scoring system in detail

Every email processed by Cleanbox receives a spam score calculated by Rspamd, an advanced spam scanning engine. This article explains exactly how scores are built, what each component means, and how the system improves over time.

Score components

The spam score is a sum of individual symbols. Each symbol represents a specific check that either passed or failed. Positive scores indicate spam signals. Negative scores indicate legitimate signals.

Authentication symbols

Symbol Score Meaning
SPF_PASSNegativeSPF authentication passed
SPF_FAILPositiveSPF authentication failed
DKIM_PASSNegativeDKIM signature valid
DKIM_FAILPositiveDKIM signature invalid or missing
DMARC_PASSNegativeDMARC alignment passed
DMARC_FAILPositiveDMARC alignment failed

Content symbols

Symbol Meaning
BAYES_SPAMBayesian classifier identified spam patterns in content
BAYES_HAMBayesian classifier identified legitimate patterns
URL_PHISHINGEmail contains links to known phishing domains
URL_SHORTENEDEmail contains shortened URLs (bit.ly, etc.)

Cleanbox custom symbols

These are unique to Cleanbox, injected based on crowd-sourced sender reputation data:

Symbol Score impact Trigger
CLEANBOX_BLOCK+8.010+ users reported sender, 90%+ spam ratio
CLEANBOX_QUARANTINE+5.05+ users reported sender, 70%+ spam ratio
CLEANBOX_GREYLIST+2.03+ users reported sender, more spam than ham
CLEANBOX_TRUSTED-2.05+ teams have this sender whitelisted or prioritized
CLEANBOX_UNCOMMON+1.0Sender has never emailed any Cleanbox user before
CLEANBOX_BULK_ESP+0.5Sent via a known bulk email service provider

How reputation is built

Cleanbox reputation symbols are powered by aggregated user feedback:

  1. A user receives a message and marks it as spam (thumbs down) or not spam (thumbs up)
  2. The feedback is recorded per sender address and per sender domain
  3. Aggregation happens at both levels:
    • Address level: feedback specific to newsletter@spamshop.com
    • Domain level: feedback aggregated across all addresses at @spamshop.com (skipped for freemail domains like gmail.com)
  4. Based on the ratio of spam vs. ham reports and the number of unique users reporting, a recommendation is generated
  5. The recommendation is passed to Rspamd as a custom header, which triggers the corresponding CLEANBOX_* symbol

Bayesian learning

When a user provides feedback, the raw email is also sent to Rspamd for Bayes training:

  • Thumbs down (spam) → Rspamd learns this message pattern as spam
  • Thumbs up (not spam) → Rspamd learns this message pattern as legitimate

This training is permanent and cumulative. The more feedback users provide, the more accurate the Bayes classifier becomes. Over time, the classifier builds a statistical model of what spam looks like in the Cleanbox ecosystem.

Reading a spam report

Every message in Cleanbox includes a spam report accessible from the message detail page. The report shows:

  • Total score — The sum of all symbol scores
  • Symbol list — Every rule that triggered, with its individual score
  • Authentication results — SPF, DKIM, DMARC pass/fail

Example report

Total score: 7.3

Symbols:
  SPF_FAIL            +2.0
  DKIM_FAIL           +1.5
  BAYES_SPAM          +3.0
  CLEANBOX_UNCOMMON   +1.0
  URL_SHORTENED       +0.8
  MIME_HTML_ONLY      -0.5
  HAS_LIST_UNSUB      -0.5

This message scored 7.3 because: SPF and DKIM both failed (+3.5), the content matched spam patterns (+3.0), it was from an unknown sender (+1.0), and it contained shortened URLs (+0.8). The negative scores came from having HTML content and a List-Unsubscribe header (signs of legitimate bulk mail).

How thresholds use the score

Each address has two configurable thresholds:

  • Quarantine threshold — Score at which messages are held for review
  • Reject threshold — Score at which messages are rejected outright

See Choosing the Right Spam Threshold for recommendations per use case.

Relay-specific scoring

Relay addresses receive two additional checks that aliases do not:

  • DNSBL (IP blacklists) — The sender server IP is checked against Spamhaus, Barracuda, and SpamCop. A blacklisted IP results in immediate rejection before content is analyzed.
  • Symbol rules — You can configure specific Rspamd symbols that trigger automatic rejection on relay addresses (e.g., always reject if PHISHING_URL is present).

These additional checks reflect the higher security requirements of relay-protected addresses, which typically handle business-critical email.