How does Cleanbox detect spam?
Cleanbox uses multiple layers of spam detection to evaluate every incoming email. The result is a numerical spam score — the higher the score, the more likely the message is spam. This article explains each layer and how they work together.
Layer 1: Sender reputation
Before the email content is even scanned, Cleanbox checks the sender reputation. This is a crowd-sourced system built from feedback across all Cleanbox users:
- Feedback aggregation — When users mark messages as spam or not-spam (thumbs down/up), this data is aggregated per sender address and per sender domain
- Trust scoring — If 5 or more teams have a sender whitelisted or prioritized, that sender is considered trusted and gets a score reduction
- New sender detection — If a sender has never emailed any Cleanbox user before, they are flagged as an uncommon sender with a small score increase
Based on the aggregated feedback, the sender receives a recommendation:
| Recommendation | Trigger | Score impact |
|---|---|---|
| Accept | Default — no significant spam reports | None |
| Greylist | 3+ users reported, more spam than ham | +2.0 |
| Quarantine | 5+ users reported, 70%+ spam ratio | +5.0 |
| Block | 10+ users reported, 90%+ spam ratio | +8.0 |
| Trusted | 5+ teams whitelisted/prioritized | -2.0 |
| Uncommon | First contact across all Cleanbox users | +1.0 |
Layer 2: Rspamd content analysis
The full email (headers + body) is sent to Rspamd, an advanced spam scanning engine. Rspamd performs dozens of checks simultaneously:
- Bayes classifier — Machine learning model trained on spam and legitimate emails. Continuously improved by user feedback (thumbs up/down).
- Authentication checks — Verifies SPF, DKIM, and DMARC. Failed authentication adds to the spam score.
- URL analysis — Checks links against known phishing, malware, and spam URL databases.
- Header analysis — Checks for forged or suspicious email headers, missing required fields, and signs of mass mailing software.
- Content patterns — Detects common spam phrases, suspicious formatting, and known spam signatures.
- Cleanbox custom symbols — The sender reputation data from Layer 1 is injected as custom scoring symbols (CLEANBOX_BLOCK, CLEANBOX_TRUSTED, etc.)
All individual checks produce a symbol with a score. These are summed into the total spam score. A typical legitimate email scores 0–2. Obvious spam often scores 10+.
Layer 3: Virus scanning (Relay only)
For relay-protected addresses, Cleanbox also runs ClamAV antivirus scanning. If a virus is detected, the message is immediately rejected — regardless of spam score or any other rules. This check runs before the spam threshold evaluation.
Layer 4: IP blacklist checks (Relay only)
For relay addresses, the sending server IP is checked against DNS-based blackhole lists (DNSBL):
- Spamhaus — The most comprehensive spam IP database
- Barracuda — Enterprise-grade reputation data
- SpamCop — Community-reported spam sources
If the IP is blacklisted, the message is rejected at the SMTP level before content is even processed.
How the spam score is used
Each alias and relay address has two configurable thresholds:
| Threshold | What happens |
|---|---|
| Quarantine threshold | Score meets or exceeds this value → message is held in quarantine for review |
| Spam threshold | Score meets or exceeds this value → message is rejected outright |
The quarantine threshold is always lower than the spam threshold, creating three zones:
Score 0 to quarantine threshold → Deliver normally
Score quarantine to spam threshold → Quarantine (hold for review)
Score above spam threshold → Reject (definite spam)
The feedback loop
Spam detection improves over time through a feedback loop:
- User receives a message and marks it as spam (thumbs down) or not-spam (thumbs up)
- The raw email is sent to Rspamd for Bayes learning — training the classifier on real examples
- The feedback count for that sender is incremented in the reputation database
- Future emails from that sender receive adjusted Cleanbox symbols based on the aggregated feedback
- The combined effect of Bayes learning + sender reputation makes detection more accurate over time
Viewing spam details
Every processed message includes a detailed spam report accessible from the message detail page. The report shows:
- The total spam score
- Every individual rule that triggered, with its name and score contribution
- Authentication results (SPF pass/fail, DKIM pass/fail, DMARC pass/fail)
- Which Cleanbox reputation symbols were applied
- Whether a virus was detected (relay addresses)
This transparency lets you understand exactly why a message was delivered, quarantined, or rejected — and adjust your thresholds accordingly.