Defeat spam with SpamBayes

102

Author: Peter Ihme

Spam email is the plague of the 21st century; SpamBayes is its cure. This client-side application analyzes all incoming email messages and automatically sorts out those that are unwanted.

SpamBayes digests the contents of email messages and counts how often certain words — e.g. Viagra — occur in spam (bad) or ham (good) messages. Based on these word patterns, it calculates an overall score that rates a message as spam, ham, or unknown. You can manually classify unknown mail as spam or ham and SpamBayes will learn accordingly.

The SpamBayes classification sorts out virtually all spam messages and almost never produces a false positive — that is, a good message wrongly identified as spam. Only once have I had to fetch an email from the junk mail folder. This happened when a Spanish friend wrote me, presumably because Spanish messages are rare in my inbox. I corrected the wrong classification, and all her subsequent messages were recognized as good. The program improves precision with each manual correction.

SpamBayes can be run as an Outlook plugin under Windows or as a POP3 or IMAP proxy under Windows, Linux/Unix, and Mac OS.

Outlook plugin

The Outlook plugin comes with an installer. The installation takes about three minutes. After installation, you need to train the program. You can prepare two folders — one with good and one with bad messages — and feed them to the program to do batch training, or you can skip the batch training and train the program while you’re using it.

SpamBayes works almost invisibly in the background. It analyzes each incoming message, and moves spam to the junk mail folder and junk suspects to the junk suspect folder. Two icons in the toolbar, a happy and a sad smiley, allow you to manually classify messages and train SpamBayes.

POP3 proxy

You can also run SpamBayes as a POP3 proxy. In this mode, you need to configure your email client to talk to the SpamBayes proxy, which talks to the actual email server. When you download email, messages have to pass through the SpamBayes proxy, which marks them as spam or ham with a custom line in the header. All you need to do is to define filters in your email client to sort incoming messages into the appropriate folders according to their mark.

To run SpamBayes as a POP3 proxy, you need Python 2.2 or later and version 2.4.3 or later of the Python email package. The SpamBayes Web site contains a detailed description of the installation procedure. It boils down to three steps:

  1. Expand the SpamBayes archive and change to that directory
  2. Run python setup.py install to install SpamBayes
  3. Finally, start the server by launching sb_server.py

Open http://localhost:8880/ and click in the Configuration link at the top. You need to enter the details of the POP3 and SMTP server. You need to configure multiple POP3 and SMTP servers if you use different email accounts.

Next, configure your email client to talk to the SpamBayes proxy instead of the mail server. You need to change the POP3 and the SMTP address. Make sure that your mail client downloads the entire message by default. If it downloads only the message headers, SpamBayes can classify only the headers, which is not recommended. Finally, create a folder for spam and define a rule that moves all mails with an X-Spambayes-classification: spam header into that folder.

There are two ways to train SpamBayes running as POP3 proxy: forward spam messages to spambayes-spam@localhost and ham messages to spambayes-ham@localhost, or use the Web interface of the SpamBayes POP3 proxy. Follow the Review Messages link and you will see a list of recently received messages which you can classify.

My two cents

SpamBayes is a wonderful helper I can no longer go online without. It reliably identifies spam messages so well that by now I never even look into my spam folder anymore.