We use cookies on our website to ensure we provide you with the best experience on our website. By using our website, you agree to the use of cookies for analytics and personalized content.This website uses cookies. More Information
It seems like your browser didn't download the required fonts. Please revise your security settings and try again.

How does Bayesian Analysis for inbound mail work and, how do I keep it running accurately?

  • Type: Knowledgebase
  • Date changed: 11 months ago
Solution #00006284

All Email Security Gateways, firmware version 3.3+

Please note: Barracuda Networks does not recommend using Bayesian filtering in most circumstances. With Energize Updates constantly updating the Barracuda Spam Firewall with protection against the latest spam and virus threats, spam accuracy should not be an issue for most organizations. With that being said if used correctly the Bayesian Database can be a powerful tool is helping to stop spam from getting to your users.

How Bayesian Analysis Works

Bayesian Analysis is a linguistic algorithm that profiles language used in both spam messages and legitimate email for any particular user or organization. To determine the likelihood that a new email is spam, Bayesian Analysis compares the words and phrases used in the new email against the corpus of previously identified email. Note that Bayesian training works only on messages with 11 words or more. The Email Security Gateway only uses Bayesian Analysis after administrators or users classify at least 200 legitimate messages and 200 spam messages.

Global Bayesian Filtering Versus Per-User

The administrator can turn on or off the Bayesian database from the BASIC > Spam Checking page.

There are two (2) Bayesian Databases. The global database which affects all users and the Per User database that users can build if they are using the Barracuda Outlook Addin.

Note: because spammers frequently change tactics and content, Bayesian data can quickly become "stale" if the database is not reset from time to time and if new messages are not classified as spam or not spam in equal numbers. Without this maintenance the users may see false positives resulting in the blocking of good email.

Getting the Best Accuracy From the Bayesian Database

All Bayesian systems rely on the fact that messages classified are not much different than new messages arriving. Over time however, spam messages change drastically and the Bayesian system - while initially able to compensate for the new format - gradually declines in its effectiveness. When this happens new classifications are needed to update the Bayesian database. To keep a Bayesian database accurate:

* There are two thoughts on keeping the Global Bayesian Database accurate.

The first is when starting to use the Bayesian Database to only classify a few (3-5) messages per day until you get to the minimum of 200 SPAM and NOT SPAM messages. Once you get to the 200 level then start classifying 1-3 spam/not spam messages each day. This will keep your database up to date with current spam trends.

The second method (which is not recommended) is to quickly classify 200 spam and 200 not spam messages as quickly as possible and then classify large groups of message every few weeks to keep up with spam and legitimate mail trends.

* For each per-user database, the user should reset their own Bayesian database and follow up with marking 200 or more messages as spam or not spam, either in their quarantine inbox (QUARANTINE > Quarantine Inbox page) or from their regular email client if they have installed the Barracuda Outlook Addin.

* It is important to remember to not classify large messages with attachments. The Bayesian routines will include the attachment in the breakdown and this will use up your Bayesian tokens in a hurry. Once all token are used up new messages classified will be ignored.

* It is also important to classify more NOT SPAM messages than SPAM messages. A Bayesian database works best when more good than bad is classified. In this case SPAM is on the bad side of things.

* Some people clear out and retrain their Bayesian Database every 6 months or so. If you start seeing a lot of legitimate mail getting a high score due to Bayesian scoring then you may need to do this. If however your Bayesian Database is working well for you and keeping up with spam treads then there is no reason to clear it out.

When to Use Bayesian Analysis

Barracuda Networks does not recommend using Bayesian filtering in most circumstances. With Energize Updates constantly updating the Email Security Gateway with protection against the latest spam and virus threats, spam accuracy should not be an issue for most organizations. Also we see a lot of administrators and users classifying large quantities of spam messages and very little not-spam messages which causes the scoring of all mail to be weighted heavily towards spam.

A case for using Bayesian Analysis would depend on the following:

* You are using global Bayesian as opposed to per-user, and the users in the organization tend to be a homogenous population with regard to the kind of content considered to be 'valid' email versus spam. This situation would make it easier for an administrator to "train" the global Bayesian database as to what is spam and what is not spam for the organization.

* Your organization requires a very high granularity of accuracy for identifying spam.

* If enabling Bayesian at the per-user level, users are sophisticated and can be trained to properly identify 'valid' messages versus spam so as to train the Bayesian database, and are willing to consistently mark BOTH 'valid' messages and spam messages in equal numbers so as to maintain the Bayesian database.

* The administrator and/or users are disciplined about resetting the Bayesian database(s) on a regular basis and re-initializing with 200 each of marked spam and not spam messages to 'keep current' with new spam techniques over time.
Outlook and Lotus Notes Plugins

If both per-user quarantine and per-user Bayesian are enabled, on the Email Security Gateway 300 and higher, the administrator can choose to allow users to download a plugin that allows messages to be classified as Spam or Not Spam directly from their email client. Users must have a quarantine account on the Email Security Gateway to use the plugins. For information about automatically or manually creating quarantine accounts for users, see Creating and Managing Accounts. For more information about the Microsoft Outlook Add-in, see Mail Client Add-Ins.

Bayesian Poisoning

Some spammers will insert content in messages intended to bypass spam rules, such as excerpts of text from books or other content that may look "legitimate" in order to fool spam filtering algorithms. This tactic is called Bayesian Poisoning and could reduce the effectiveness of a Bayesian database, if many of these messages are marked as either spam or not spam. The Barracuda Networks Bayesian engine is, however, very sophisticated and protects against Bayesian Poisoning as long as administrators or users consistently maintain their databases.

Additional Notes:
For more information about this subject, please see the following Techlib article: http://techlib.barracuda.com/display/BSFv51/Bayesian+Analysis+-+Inbound+Mail

Link to This Page: