Evolution's filtering tools for handling incoming messages provide the raw material for spam detection. However, the filters have difficulty knowing which characteristics of incoming mail should be treated as signs of spam. Information I gleaned from the Internet was only moderately useful; most of it was incomplete, obsolete, or inaccurate.
To find a solution, I pored over the headers of messages that Mozilla Thunderbird had detected as spam. From this research, I isolated the most common characteristics of spam and built several filters without leaving Evolution. Wanting to further improve spam detection, I spent several evenings testing various instructions for linking Evolution with SpamAssassin through a filter until I found one that worked. Taken together, these filters provided all the spam filtering I needed to remove my last obstacle to using Evolution.
Evolution's filtering tools
Evolution's filter rules are created from Tools > Filters > Add. Each rule consists of a logical condition with an If statement, which sets the conditions under which the rule applies, and a Then statement, which applies what happens when the conditions are met. You create both If and Then statements by selecting
the Add button in the appropriate pane, then selecting from drop-down boxes or
typing in fields. By default, a rule applies when all If statements are met, but you can also set the Execute action drop-down list to if any criteria are met. For convenience, give each rule an appropriate name, so you don't have to open it to know what the rule does.
For spam filters, If statements have many possibilities. Focusing on a specific message header, such as the Sender, Subject, Recipient, or on a message's contents, If statements can detect
an exact phrase using the is building block, or part of a phrase using contains,
starts with, or ends with. You can also introduce a degree
of fuzziness by using sounds like or regular expressions. Many of these search
patterns also have an opposite, such as is not. If none of these patterns, alone or
in combination, does what you want, you can select Pipe to Program to call a
shell command, such as grep.
By contrast, you'll need only a limited number of Then statements. Before you write Then statements,
you'll want to create a Spam folder; Evolution comes with a Junk folder, but, in Debian,
you can't use it when writing a filter rule from the GUI, not even if you modify<nobr> <wbr></nobr>/home/[user]/.evolution/mail/filters.xml
in a text editor. For each spam rule, you'll want to use three Then statements in this order:
Move to Folder Spam
Set Status Read
Stop Processing
This set of Then statements delivers the email to the Spam folder and marks it as read, so you aren't distracted by the appearance of unread messages in the folder. It then keeps other filters from being
applied to the same message and delivering it to other folders. Once you've tested your rules together, you may want to change the second statement to Set Status Deleted. However, until you're confident with your set of rules, this choice may cause you to lose legitimate messages.
While testing the results, you may want to assign a color or a sound to each piece of email marked by a rule, so you can see how many messages it is catching.
Completed rules are listed in the Tools > Filter window. Evolution processes them sequentially, so the order in which they're listed in the Filter window can affect the how useful a rule is. For example, a rule that moves messages from person@address.com to a
particular folder will never come into effect if a rule that is processed first deletes all mail in which the sender's address contains @address.com. Since combinations of rules can have
unexpected results, the Filter window includes buttons for moving a selected rule up or down in the list.
When you first write a list of rules, you'll probably need to debug them. If you do, opening View > Message Display > Show Email Source can help by showing the message headers that the normal view conceals.
Creating basic spam filters
The simplest spam filters in Evolution are the creation of whitelists and blacklists -- that is, lists of addresses from which you will or will not accept email. If you have a particular person in mind for either list,
then the IF statement is: Sender is [email address]. If you want to include an entire domain, then the IF statement can be either Sender contains [domain name] or
Sender ends with [domain name]. Set the rule to execute when any criteria are met, and you only need to create two rules, each a list of addresses or domains.
For most people, a whitelist is easy to set up. Place the whitelist at the top of the filter rules, and you won't miss any essential email. By contrast, because spammers frequently change addresses, a blacklist is likely to require constant updating. This updating defeats the utility of creating rules by forcing you to spend far more time dealing with spam than you care to.
For this reason, rules that identify characteristics of spam rather than
email
sources are likely to be more useful. For example, emails that list your address as
both sender and recipient are likely to be spam, so you could create a
rule to send them to the Spam folder using two IF statements:
Sender contains [email address] and Recipients
contains [email address].
Similarly, these two If statements create a filter based on size: Size (kb) is greater than 60 and Attachments do not exist. Since 60 kilobytes is about 9,000 words, that size should accommodate any mailing list digests you receive. If you don't subscribe to any mailing list digests, you can adjust the number to 20 or even lower. Either way, you can usually assume that a larger message without attachments will be spam full of graphics.
Other useful spam filters include a search for:
click here or Cialis Pipe to Program grep 'name=.*\.\(exe\|scr\|bat\|pif\)'.
Many of these searches can be defined by If statements within a single rule.
Depending on your correspondents, you may also want to filter by character-set (charset). For example, I don't have any correspondents who write to me in Japanese, Korean, or Cyrillic characters. However, I regularly receive spam that uses these character sets. For that reason, setting up a filter on those character sets works for me. I also include a filter for messages that list no character set, since that can also be a sign of spam.
You can also search for HTML tags that are more likely to be used in spam. Filtering out all HTML email by searching for text/html seems a bit drastic, although some purists might consider it. More practically, you might consider setting up If statements that search the message body for:
(font size= "+) (tbody) (#0000CC, #FFFF00)
Even if you have correspondents who insist on HTML email, these tags are still unlikely to be in the average legitimate e-mail. If necessary, you can filter such correspondents in a whitelist.
Evolution filters cannot check for all the signs that anti-spam software can detect. For instance, they cannot, as far as I can figure, assess the percentage of the message body that is in HTML, or determine that Microsoft Outlook is falsely identified as the mailer. Nor can Evolution filters evaluate the likelihood that a message is spam. Yet, with ingenuity and a study of results obtained from other anti-spam software, you might manage to filter 90-95% of your spam without any other measure.
Running SpamAssassin with Evolution
If simple filters aren't effective enough, Evolution has built-in support for
SpamAssassin. You can tap
this support by building a filter that pipes to SpamAssassin. A standard
package in most distributions, SpamAssassin installs spamd, a daemon for detecting and marking spam, and spamc, the SpamAssassin client.
As with any anti-spam program, you can tweak SpamAssassin to improve its detection. If you choose, you can maintain a blacklist and whitelist in<nobr> <wbr></nobr>/etc/spamassassin/local.cf instead of within Evolution. However, there is no great advantage to doing so -- and none whatsoever in maintaining lists in both programs.
Instead, you can use the command line utility sa-learn to train
SpamAssassin to detect either spam or ham (SpamAssassin's name for non-spam)
by pointing to a directory. For example, if SpamAssassin is missing spam that
other filters are catching, you can improve its detection of spam with the command
sa-learn --spam<nobr> <wbr></nobr>/home/[user]/.evolution/mail/local/Spam. After running, sa-learn returns a message summarizing the results. This utility is especially useful when you're switching from a mail reader that has already collected messages labeled as spam. If you point sa-learn at the folder that contains the spam, you can start using SpamAssassin in Evolution with the program already mostly trained.
To create the Evolution filter for SpamAssassin, use the If statement
Pipe to Program spamassassin -e does not return 0. This statement runs SpamAssassin and causes it to exit without a 0 exit code when spam is detected. The online documentation for SpamAssassin suggests replacing spamassassin with spamc in scripts to improve performance, and several guides on the Internet echo this advice. However, in Evolution 2.0.4 on Debian, this substitution disables the filter, so it is perhaps best avoided
in Evolution in general.
You can use the usual Then statements to mark each marked message as read and to stop processing it. However, for the SpamAssassin filter, you have no need to specify the directory to which the message is moved. When you create a pipe to SpamAssassin, Evolution will move only messages marked as spam to the Junk folder.
Once you've trained SpamAssassin, you may decide to remove the other filters. However, it is less effort and more effective to use all the filters mentioned here together. If my results are any indication, between the SpamAssassin filters and the basic ones, you should be able to detect more than 95% of incoming spam -- a much higher result than with Thunderbird's built-in spam detection.
Fine-tuning
Once your filters are in place, expect to spend two or three days adjusting them. Unless you have a savagely logical mind and an eidetic memory, chances are that the combination of filters will yield results you didn't anticipate. It's also easy to make mistakes when selecting from drop-down lists. Expect to correct the filters and juggle the order in which they are applied. If you apply all the filters listed here, you will probably have to decide which filters to delete because together they flag nearly all of your incoming emails.
As you fine-tune, one point worth remembering is this: a filter that moves a non-spam message to Evolution's Inbox does not stop other filters from being applied to it -- even if you have used the Stop Processing Then statement. Since all messages pass through the Inbox, sending a message to it simply returns it to the processing queue. The solution is to use only Stop Processing.
Similarly, you may notice that SpamAssassin treats mailing lists as spam. But if you create rules that send mailing list emails to specific folders and place the rules above the SpamAssassin filter, then you should be able to solve any problems. Joe Barr's Nine tips for filtering messages in Novell Evolution offers additional clues for solving problems that you may encounter.
If the filters suggested here don't give you a spam-free life, you can enlist additional help. Procmail provides more sophisticated filters than Evolution itself, and it is very effective when combined with the anti-spam program Bogofilter. The drawback to these programs is that they are not supported directly by Evolution, and much of the configuration needs to be done outside of Evolution. Still, many users, especially professional system administrators, prefer an in-depth defense that simultaneously deploys several different anti-spam programs.
However, the filters listed here may be all that a home or small business user needs. For me, debugging them was more a matter of eliminating false positives than of spam slipping through the net.
Spam is a shifting target, and I expect I'll need to make more changes in the future, especially retraining SpamAssassin. But for now, aside from quick checks each day for false positives, I'm enjoying the blissful luxury of pretending that spam no longer exists.
Bruce Byfield is a computer journalist, course designer, and instructor who contributes regularly to NewsForge and Linux Journal.
Note: Comments are owned by the poster. We are not responsible for their content.
why do this?
Posted by: Anonymous Coward on July 06, 2005 07:12 PM#