Cleanbox
Features Helpdesk Blog Pricing Contact
Sign in Start free trial

How spam training works

Cleanbox uses a machine learning spam filter (Rspamd with a Bayesian classifier) that improves over time based on user feedback. When you mark a message as spam or legitimate, you are directly training the filter. This article explains what happens behind the scenes and how to use feedback effectively.

How to provide feedback

On the message detail page (click any message in your message log or quarantine), you will see two buttons:

  • Spam — "This email is unwanted." Tells Cleanbox this message should have been caught.
  • Legitimate — "This email is useful." Tells Cleanbox this message should not have been blocked or quarantined.

You can only submit feedback once per message. The button becomes inactive after you provide feedback.

What happens when you submit feedback

Two things happen immediately:

  1. Bayesian training — The raw email content is sent to Rspamd's Bayesian classifier. If you marked it as spam, the classifier learns that the words, patterns, and structures in that email are spam signals. If you marked it as legitimate, it learns those patterns are normal. Over time, this makes the classifier more accurate at scoring similar emails.
  2. Sender reputation update — Your feedback is recorded with the sender's email address and domain. This data feeds into Cleanbox's crowd-sourced reputation system — aggregated across all Cleanbox users, not just your account.

How crowd-sourced reputation works

Cleanbox aggregates feedback from all users to build sender reputation scores. These scores are injected into the spam scanning process as custom symbols:

SignalWhat it means
Trusted sender5 or more teams have whitelisted or prioritized this sender. Score is reduced (less likely to be flagged as spam).
Blocked sender10 or more users reported this sender, with 90%+ marking it as spam. Score is significantly increased.
Quarantine-level sender5 or more users reported this sender, with 70%+ spam ratio. Score is moderately increased.
Greylist-level sender3 or more users reported this sender, with more spam than legitimate reports. Score is slightly increased.
Uncommon senderFirst-ever contact across all Cleanbox users. Score is slightly increased as a precaution.

This means your feedback helps not just your own inbox, but every Cleanbox user. When a spammer targets multiple Cleanbox users, the first few reports trigger automatic protection for everyone else.

When to provide feedback

Mark as spam when:

  • An unwanted email was delivered to your inbox (the filter missed it)
  • A marketing email you did not sign up for got through
  • A phishing or scam email was not caught

Mark as legitimate when:

  • A wanted email ended up in quarantine
  • A message from a known sender was incorrectly flagged
  • A transactional email (order confirmation, password reset) was blocked

Do not provide feedback for:

  • Emails you simply do not want anymore — use unsubscribe instead
  • Emails that were already correctly handled (delivered wanted email, blocked obvious spam) — feedback is only useful when the filter made a mistake

Requirements

  • The message must still be within your plan's retention period — the raw email content needs to be available for Bayesian training.
  • The message must have an associated contact (almost all messages do, unless the sender could not be identified).
  • You need write permission on Messages if you are a team member (not owner).