Filtering spam in Novell Evolution

109

Author: Bruce Byfield

When I switched to Novell Evolution, finding an anti-spam solution became
a top priority. Having warmed to Evolution after noticing that its interface
was no longer an imitation of Microsoft Outlook, I quickly learned to appreciate
its centralized mail and business tools. Spoiled by Mozilla Thunderbird’s
built-in spam detection, I wanted some equivalent in Evolution.

Evolution’s filtering tools for handling incoming messages provide
the raw material for spam detection. However, the filters have difficulty
knowing which characteristics of incoming mail should be treated as signs
of spam. Information I gleaned from the Internet was only moderately useful; most of it was incomplete, obsolete, or inaccurate.

To find a solution, I pored over the headers of messages that Mozilla Thunderbird
had detected as spam. From this research, I isolated the most common characteristics
of spam and built several filters without leaving Evolution. Wanting to further improve spam detection, I spent several evenings testing various instructions
for linking Evolution with SpamAssassin through a filter until I found one that
worked. Taken together, these filters provided all the spam filtering I needed to
remove my last obstacle to using Evolution.


Evolution’s filtering tools

Evolution’s filter rules are created from Tools > Filters > Add. Each rule consists of a logical condition with an If statement, which sets the conditions under which the rule applies, and a Then statement, which applies what happens when the conditions are met. You create both If and Then statements by selecting
the Add button in the appropriate pane, then selecting from drop-down boxes or
typing in fields. By default, a rule applies when all If statements are met, but you can also set the Execute action drop-down list to if any criteria are met. For convenience, give each rule an appropriate name, so you don’t have to open it to know what the rule does.

For spam filters, If statements have many possibilities. Focusing on a specific message header, such as the Sender, Subject, Recipient, or on a message’s contents, If statements can detect
an exact phrase using the is building block, or part of a phrase using contains,
starts with, or ends with. You can also introduce a degree
of fuzziness by using sounds like or regular expressions. Many of these search
patterns also have an opposite, such as is not. If none of these patterns, alone or
in combination, does what you want, you can select Pipe to Program to call a
shell command, such as grep.

By contrast, you’ll need only a limited number of Then statements. Before you write Then statements,
you’ll want to create a Spam folder; Evolution comes with a Junk folder, but, in Debian,
you can’t use it when writing a filter rule from the GUI, not even if you modify/home/[user]/.evolution/mail/filters.xml
in a text editor. For each spam rule, you’ll want to use three Then statements in this order:


  1. Move to Folder Spam

  2. Set Status Read

  3. Stop Processing

This set of Then statements delivers the email to the Spam folder and marks it as read, so you aren’t distracted by the appearance of unread messages in the folder. It then keeps other filters from being
applied to the same message and delivering it to other folders. Once you’ve tested your rules together, you may want to change the second statement to Set Status Deleted. However, until you’re confident with your set of rules, this choice may cause you to lose legitimate messages.

While testing the results, you may want to assign a color or a sound to each piece of email marked by a rule, so you can see how many messages it is catching.

Completed rules are listed in the Tools > Filter window. Evolution processes them sequentially, so the order in which they’re listed in the Filter window can affect the how useful a rule is. For example, a rule that moves messages from person@address.com to a
particular folder will never come into effect if a rule that is processed first deletes all mail in which the sender’s address contains @address.com. Since combinations of rules can have
unexpected results, the Filter window includes buttons for moving a selected rule up or down in the list.

When you first write a list of rules, you’ll probably need to debug them. If you do,
opening View > Message Display > Show Email Source can help by showing the message
headers that the normal view conceals.


Creating basic spam filters

The simplest spam filters in Evolution are the creation of whitelists and blacklists — that is, lists of addresses from which you will or will not accept email. If you have a particular person in mind for either list,
then the IF statement is: Sender is [email address]. If you want to include an entire domain, then the IF statement can be either Sender contains [domain name] or
Sender ends with [domain name]. Set the rule to execute when any criteria are met, and you only need to create two rules, each a list of addresses or domains.

For most people, a whitelist is easy to set up. Place the whitelist at the top of the filter rules, and you won’t miss any essential email. By contrast, because spammers frequently change addresses, a blacklist is likely to require constant updating. This updating defeats the utility of creating rules by forcing you to spend far more time dealing with spam than you care to.

For this reason, rules that identify characteristics of spam rather than
email
sources are likely to be more useful. For example, emails that list your address as
both sender and recipient are likely to be spam, so you could create a
rule to send them to the Spam folder using two IF statements:
Sender contains [email address] and Recipients
contains [email address]
.

Similarly, these two If statements create a filter based on size: Size (kb) is greater than 60 and Attachments do not exist. Since 60 kilobytes is about 9,000 words, that size should accommodate any mailing list digests you receive. If you don’t subscribe to any mailing list digests, you can adjust the number to 20 or even lower. Either way, you can usually assume that a larger message without attachments will be spam full of graphics.

Other useful spam filters include a search for:

  • Words and phrases likely to be evidence of spam, such as click here or Cialis
  • Windows executables as an attachment. This search could be a list of IF statements containing the extensions of
    Windows executables, or simply Pipe to Program grep 'name=.*.(exe|scr|bat|pif)'.
  • A date more than 96 hours before the message was received. Such a date could indicate a relayed message.
  • An empty Reply-To header

Many of these searches can be defined by If statements within a single rule.

Depending on your correspondents, you may also want to filter by character-set (charset). For example, I don’t have any correspondents who write to me in Japanese, Korean, or Cyrillic characters. However, I regularly receive spam that uses these character sets. For that reason, setting up a filter on those character sets works
for me. I also include a filter for messages that list no character set, since that can also be a sign of spam.

You can also search for HTML tags that are more likely to be used in spam. Filtering out all HTML email by searching for text/html seems a bit drastic, although some purists might consider it. More practically, you might consider setting up If statements that search the message body for:

  • large and extra large fonts (font size= "+)
  • tables (tbody)
  • red or blue text (#0000CC, #FFFF00)

Even if you have correspondents who insist on HTML email, these tags are still unlikely to be in the average legitimate e-mail. If necessary, you can filter such correspondents in a whitelist.

Evolution filters cannot check for all the signs that anti-spam software can detect. For instance, they cannot, as far as I can figure, assess the percentage of the message body that is in HTML, or determine that
Microsoft Outlook is falsely identified as the mailer. Nor can Evolution filters evaluate the likelihood that a message is spam. Yet, with ingenuity and a study of results obtained from other anti-spam
software, you might manage to filter 90-95% of your spam without any other measure.


Running SpamAssassin with Evolution

If simple filters aren’t effective enough, Evolution has built-in support for
SpamAssassin. You can tap
this support by building a filter that pipes to SpamAssassin. A standard
package in most distributions, SpamAssassin installs spamd, a daemon for detecting and marking spam, and spamc, the SpamAssassin client.

As with any anti-spam program, you can tweak SpamAssassin to improve its detection.
If you choose, you can maintain a blacklist and whitelist in/etc/spamassassin/local.cf
instead of within Evolution. However, there is no great advantage to doing so — and
none whatsoever in maintaining lists in both programs.

Instead, you can use the command line utility sa-learn to train
SpamAssassin to detect either spam or ham (SpamAssassin’s name for non-spam)
by pointing to a directory. For example, if SpamAssassin is missing spam that
other filters are catching, you can improve its detection of spam with the command
sa-learn --spam/home/[user]/.evolution/mail/local/Spam. After running, sa-learn returns a message summarizing the results. This utility is especially useful when you’re switching from a mail reader that has already collected messages labeled as spam. If you point sa-learn at the folder that contains the spam, you can start using SpamAssassin in Evolution with the program already mostly trained.

To create the Evolution filter for SpamAssassin, use the If statement
Pipe to Program spamassassin -e does not return 0. This statement runs SpamAssassin and causes it to exit without a 0 exit code when spam is detected. The online documentation for SpamAssassin suggests replacing spamassassin with spamc in scripts to improve performance, and several guides on the Internet echo this advice. However, in Evolution 2.0.4 on Debian, this substitution disables the filter, so it is perhaps best avoided
in Evolution in general.

You can use the usual Then statements to mark each marked message as read and to
stop processing it. However, for the SpamAssassin filter, you have no need to specify the directory to which the message is moved. When you create a pipe to SpamAssassin, Evolution will move only messages marked as spam to the Junk folder.

Once you’ve trained SpamAssassin, you may decide to remove the other filters. However, it is less effort and more effective to use all the filters mentioned here together. If my results are any indication, between the SpamAssassin filters and the basic ones, you should be able to detect more than 95% of incoming spam — a much higher result than with Thunderbird’s built-in spam detection.


Fine-tuning

Once your filters are in place, expect to spend two or three days adjusting
them. Unless you have a savagely logical mind and an eidetic memory, chances
are that the combination of filters will yield results you didn’t anticipate.
It’s also easy to make mistakes when selecting from drop-down lists. Expect
to correct the filters and juggle the order in which they are applied. If
you apply all the filters listed here, you will probably have to decide which
filters to delete because together they flag nearly all of your incoming
emails.

As you fine-tune, one point worth remembering is this: a filter that moves a non-spam message to Evolution’s Inbox does not stop other filters from being applied to it — even if you have used the Stop Processing Then statement. Since all messages pass through the Inbox, sending a message to it simply returns it to the processing queue. The solution is to use only Stop Processing.

Similarly, you may notice that SpamAssassin treats mailing lists as spam.
But if you create rules that send mailing list emails to specific folders
and place the rules above the SpamAssassin filter, then you should be able
to solve any problems. Joe Barr’s Nine tips for filtering messages in Novell Evolution offers additional clues for solving problems that you may encounter.

If the filters suggested here don’t give you a spam-free life, you can enlist additional
help.
Procmail provides more
sophisticated filters than Evolution itself, and it is very effective when combined
with the anti-spam program Bogofilter.
The drawback to these programs is that they are not supported directly by Evolution,
and much of the configuration needs to be done outside of Evolution. Still, many users, especially
professional system administrators, prefer an in-depth defense that simultaneously deploys several
different anti-spam programs.

However, the filters listed here may be all that a home or small business
user needs. For me, debugging them was more a matter of eliminating false
positives than of spam slipping through the net.

Spam is a shifting
target, and I expect I’ll need to make more changes in the future, especially
retraining SpamAssassin. But for now, aside from quick checks each day for false positives, I’m enjoying the blissful luxury of pretending that spam no longer exists.

Bruce Byfield is a computer journalist, course designer, and instructor
who contributes regularly to NewsForge and Linux Journal.