Linux.com

Feature

Filtering spam in Novell Evolution

By Bruce Byfield on July 06, 2005 (8:00:00 AM)

Share    Print    Comments   

When I switched to Novell Evolution, finding an anti-spam solution became a top priority. Having warmed to Evolution after noticing that its interface was no longer an imitation of Microsoft Outlook, I quickly learned to appreciate its centralized mail and business tools. Spoiled by Mozilla Thunderbird's built-in spam detection, I wanted some equivalent in Evolution.

Evolution's filtering tools for handling incoming messages provide the raw material for spam detection. However, the filters have difficulty knowing which characteristics of incoming mail should be treated as signs of spam. Information I gleaned from the Internet was only moderately useful; most of it was incomplete, obsolete, or inaccurate.

To find a solution, I pored over the headers of messages that Mozilla Thunderbird had detected as spam. From this research, I isolated the most common characteristics of spam and built several filters without leaving Evolution. Wanting to further improve spam detection, I spent several evenings testing various instructions for linking Evolution with SpamAssassin through a filter until I found one that worked. Taken together, these filters provided all the spam filtering I needed to remove my last obstacle to using Evolution.

Evolution's filtering tools

Evolution's filter rules are created from Tools > Filters > Add. Each rule consists of a logical condition with an If statement, which sets the conditions under which the rule applies, and a Then statement, which applies what happens when the conditions are met. You create both If and Then statements by selecting the Add button in the appropriate pane, then selecting from drop-down boxes or typing in fields. By default, a rule applies when all If statements are met, but you can also set the Execute action drop-down list to if any criteria are met. For convenience, give each rule an appropriate name, so you don't have to open it to know what the rule does.

For spam filters, If statements have many possibilities. Focusing on a specific message header, such as the Sender, Subject, Recipient, or on a message's contents, If statements can detect an exact phrase using the is building block, or part of a phrase using contains, starts with, or ends with. You can also introduce a degree of fuzziness by using sounds like or regular expressions. Many of these search patterns also have an opposite, such as is not. If none of these patterns, alone or in combination, does what you want, you can select Pipe to Program to call a shell command, such as grep.

By contrast, you'll need only a limited number of Then statements. Before you write Then statements, you'll want to create a Spam folder; Evolution comes with a Junk folder, but, in Debian, you can't use it when writing a filter rule from the GUI, not even if you modify<nobr> <wbr></nobr>/home/[user]/.evolution/mail/filters.xml in a text editor. For each spam rule, you'll want to use three Then statements in this order:

  1. Move to Folder Spam
  2. Set Status Read
  3. Stop Processing

This set of Then statements delivers the email to the Spam folder and marks it as read, so you aren't distracted by the appearance of unread messages in the folder. It then keeps other filters from being applied to the same message and delivering it to other folders. Once you've tested your rules together, you may want to change the second statement to Set Status Deleted. However, until you're confident with your set of rules, this choice may cause you to lose legitimate messages.

While testing the results, you may want to assign a color or a sound to each piece of email marked by a rule, so you can see how many messages it is catching.

Completed rules are listed in the Tools > Filter window. Evolution processes them sequentially, so the order in which they're listed in the Filter window can affect the how useful a rule is. For example, a rule that moves messages from person@address.com to a particular folder will never come into effect if a rule that is processed first deletes all mail in which the sender's address contains @address.com. Since combinations of rules can have unexpected results, the Filter window includes buttons for moving a selected rule up or down in the list.

When you first write a list of rules, you'll probably need to debug them. If you do, opening View > Message Display > Show Email Source can help by showing the message headers that the normal view conceals.

Creating basic spam filters

The simplest spam filters in Evolution are the creation of whitelists and blacklists -- that is, lists of addresses from which you will or will not accept email. If you have a particular person in mind for either list, then the IF statement is: Sender is [email address]. If you want to include an entire domain, then the IF statement can be either Sender contains [domain name] or Sender ends with [domain name]. Set the rule to execute when any criteria are met, and you only need to create two rules, each a list of addresses or domains.

For most people, a whitelist is easy to set up. Place the whitelist at the top of the filter rules, and you won't miss any essential email. By contrast, because spammers frequently change addresses, a blacklist is likely to require constant updating. This updating defeats the utility of creating rules by forcing you to spend far more time dealing with spam than you care to.

For this reason, rules that identify characteristics of spam rather than email sources are likely to be more useful. For example, emails that list your address as both sender and recipient are likely to be spam, so you could create a rule to send them to the Spam folder using two IF statements: Sender contains [email address] and Recipients contains [email address].

Similarly, these two If statements create a filter based on size: Size (kb) is greater than 60 and Attachments do not exist. Since 60 kilobytes is about 9,000 words, that size should accommodate any mailing list digests you receive. If you don't subscribe to any mailing list digests, you can adjust the number to 20 or even lower. Either way, you can usually assume that a larger message without attachments will be spam full of graphics.

Other useful spam filters include a search for:

  • Words and phrases likely to be evidence of spam, such as click here or Cialis
  • Windows executables as an attachment. This search could be a list of IF statements containing the extensions of Windows executables, or simply Pipe to Program grep 'name=.*\.\(exe\|scr\|bat\|pif\)'.
  • A date more than 96 hours before the message was received. Such a date could indicate a relayed message.
  • An empty Reply-To header

Many of these searches can be defined by If statements within a single rule.

Depending on your correspondents, you may also want to filter by character-set (charset). For example, I don't have any correspondents who write to me in Japanese, Korean, or Cyrillic characters. However, I regularly receive spam that uses these character sets. For that reason, setting up a filter on those character sets works for me. I also include a filter for messages that list no character set, since that can also be a sign of spam.

You can also search for HTML tags that are more likely to be used in spam. Filtering out all HTML email by searching for text/html seems a bit drastic, although some purists might consider it. More practically, you might consider setting up If statements that search the message body for:

  • large and extra large fonts (font size= "+)
  • tables (tbody)
  • red or blue text (#0000CC, #FFFF00)

Even if you have correspondents who insist on HTML email, these tags are still unlikely to be in the average legitimate e-mail. If necessary, you can filter such correspondents in a whitelist.

Evolution filters cannot check for all the signs that anti-spam software can detect. For instance, they cannot, as far as I can figure, assess the percentage of the message body that is in HTML, or determine that Microsoft Outlook is falsely identified as the mailer. Nor can Evolution filters evaluate the likelihood that a message is spam. Yet, with ingenuity and a study of results obtained from other anti-spam software, you might manage to filter 90-95% of your spam without any other measure.

Running SpamAssassin with Evolution

If simple filters aren't effective enough, Evolution has built-in support for SpamAssassin. You can tap this support by building a filter that pipes to SpamAssassin. A standard package in most distributions, SpamAssassin installs spamd, a daemon for detecting and marking spam, and spamc, the SpamAssassin client.

As with any anti-spam program, you can tweak SpamAssassin to improve its detection. If you choose, you can maintain a blacklist and whitelist in<nobr> <wbr></nobr>/etc/spamassassin/local.cf instead of within Evolution. However, there is no great advantage to doing so -- and none whatsoever in maintaining lists in both programs.

Instead, you can use the command line utility sa-learn to train SpamAssassin to detect either spam or ham (SpamAssassin's name for non-spam) by pointing to a directory. For example, if SpamAssassin is missing spam that other filters are catching, you can improve its detection of spam with the command sa-learn --spam<nobr> <wbr></nobr>/home/[user]/.evolution/mail/local/Spam. After running, sa-learn returns a message summarizing the results. This utility is especially useful when you're switching from a mail reader that has already collected messages labeled as spam. If you point sa-learn at the folder that contains the spam, you can start using SpamAssassin in Evolution with the program already mostly trained.

To create the Evolution filter for SpamAssassin, use the If statement Pipe to Program spamassassin -e does not return 0. This statement runs SpamAssassin and causes it to exit without a 0 exit code when spam is detected. The online documentation for SpamAssassin suggests replacing spamassassin with spamc in scripts to improve performance, and several guides on the Internet echo this advice. However, in Evolution 2.0.4 on Debian, this substitution disables the filter, so it is perhaps best avoided in Evolution in general.

You can use the usual Then statements to mark each marked message as read and to stop processing it. However, for the SpamAssassin filter, you have no need to specify the directory to which the message is moved. When you create a pipe to SpamAssassin, Evolution will move only messages marked as spam to the Junk folder.

Once you've trained SpamAssassin, you may decide to remove the other filters. However, it is less effort and more effective to use all the filters mentioned here together. If my results are any indication, between the SpamAssassin filters and the basic ones, you should be able to detect more than 95% of incoming spam -- a much higher result than with Thunderbird's built-in spam detection.

Fine-tuning

Once your filters are in place, expect to spend two or three days adjusting them. Unless you have a savagely logical mind and an eidetic memory, chances are that the combination of filters will yield results you didn't anticipate. It's also easy to make mistakes when selecting from drop-down lists. Expect to correct the filters and juggle the order in which they are applied. If you apply all the filters listed here, you will probably have to decide which filters to delete because together they flag nearly all of your incoming emails.

As you fine-tune, one point worth remembering is this: a filter that moves a non-spam message to Evolution's Inbox does not stop other filters from being applied to it -- even if you have used the Stop Processing Then statement. Since all messages pass through the Inbox, sending a message to it simply returns it to the processing queue. The solution is to use only Stop Processing.

Similarly, you may notice that SpamAssassin treats mailing lists as spam. But if you create rules that send mailing list emails to specific folders and place the rules above the SpamAssassin filter, then you should be able to solve any problems. Joe Barr's Nine tips for filtering messages in Novell Evolution offers additional clues for solving problems that you may encounter.

If the filters suggested here don't give you a spam-free life, you can enlist additional help. Procmail provides more sophisticated filters than Evolution itself, and it is very effective when combined with the anti-spam program Bogofilter. The drawback to these programs is that they are not supported directly by Evolution, and much of the configuration needs to be done outside of Evolution. Still, many users, especially professional system administrators, prefer an in-depth defense that simultaneously deploys several different anti-spam programs.

However, the filters listed here may be all that a home or small business user needs. For me, debugging them was more a matter of eliminating false positives than of spam slipping through the net.

Spam is a shifting target, and I expect I'll need to make more changes in the future, especially retraining SpamAssassin. But for now, aside from quick checks each day for false positives, I'm enjoying the blissful luxury of pretending that spam no longer exists.

Bruce Byfield is a computer journalist, course designer, and instructor who contributes regularly to NewsForge and Linux Journal.

Bruce Byfield is a computer journalist who writes regularly for Linux.com.

Share    Print    Comments   

Comments

on Filtering spam in Novell Evolution

Note: Comments are owned by the poster. We are not responsible for their content.

why do this?

Posted by: Anonymous Coward on July 06, 2005 07:12 PM
why do any of these when evolution has built-in support for spam? you shouldn't need to make any filters at all, when spam comes in, mark it as junk (by clicking the junk button, which moves it to the junk folder and "learns" about it). i do this and don't have more than a msg or two get through in a day. i believe this method just uses spamassassin under the covers, so the things mentioned in this article just seem to duplicate that.

#

Three words Black List support.

Posted by: Anonymous Coward on July 07, 2005 08:22 AM
Spammer gets black lists I don't need to click on it. Its junked.

Also Content filtering. Nice other way of junking.

Thunderbird I have to look up if I can do this to it.

Would get rid of alot of my junk.

When only 1 or 2 are getting threw a day if you only check you email monthly that is over 30 of them. People who don't check there mail often or getting a lot of spam need this.

#

Re:why do this?

Posted by: Bruce Byfield on July 07, 2005 10:13 AM
Because the built-in support depends on both your distribution and your Internet connection.

#

Too much hassle

Posted by: Anonymous Coward on July 07, 2005 08:27 PM
This is too much hassle. I don't know how integrated Evolution is to SpamAssassin these days (I barely use it as I don't like its interface), but one would be much better with POPFile: <a href="http://popfile.sourceforge.net/" title="sourceforge.net">http://popfile.sourceforge.net/</a sourceforge.net>

POPFile has the better Bayesian-based detection filters that I ever saw and it rarely fails when tagging your e-mails. Of course, like any other Bayesian-based spam filter, you'll have to spend some time "training" it in order to ensure that it will classify your e-mail properly, but I achieved that on just one week and I've seen several testmonials that claims the same thing. (Keep in mind that your mileage may vary based on how much e-mail you receive on a daily basis and the noise-to-signal ratio, of course)

The only catch is that you have to gotta used to manage it through a web interface, so you gotta keep it opened in your browser sometimes in order to perform some tweaks here and there, but probably you'll get used to it, and after some time, you'll rarely need to do that.

It's definitely worth a look!

#

Can't use Junk folder in Debian?

Posted by: Anonymous Coward on July 08, 2005 12:11 AM
Hi, I am currently using Debian Etch, but I'm sure this works in Sarge too: instruct your filter to set a mail's status as "Junk", it will move to the Junk folder automatically created. Then add any other actions you wish (e.g. set status as "Read", then stop processing)

#

Re:Can't use Junk folder in Debian?

Posted by: Anonymous Coward on July 09, 2005 03:31 AM
Yea, Bruce could have saved lots of time and perhaps even writer's cramp by simply updating to a fairly recent version of Evolution.

With that said, I'd like to point out that setting up elaborate filters and spam detection daemons on mail clients is not a good idea. What happens when you use a different client or system? What happens when you are on the road? Why waste your bandwidth transferring spam in the first place?

The better solution is to implement the spam filtering and virus scanning at the system level, preferably on the mail server itself. That means that your filtering rules and training are portable and independent of your location or client software. Having it on the mail server also allows you to use RBL's and other techniques that block the spam before it is even transfered.

Presently, 300 attempts per day are made to deliver spam to me. 60% of those 300 is blocked, before transfer, by RBL checking. The remaining 39% is caught at the mail server by Spamassassin and ClamAV. Approximately 3 spams a week and 0 viruses or worms actually make it to my mailbox. I suppose I could create filters and use Spamassassin on Evolution to deal with those 3 messages, since I already use Evolution and it has these features built in. But, at 3 spams a week, it isn't worth the effort to configure and train the client. The Delete key is much more efficient.

#

Why Is This Even Necessary?

Posted by: Anonymous Coward on July 06, 2005 08:57 PM
Evolution came with spam filtering turned on for me by default when I installed it. It also has a Junk mail folder set up by default when it was installed, into which that spam gets routed. Why is this article even necessary? Does it not get set up that way for everyone?

#

Re:Why Is This Even Necessary?

Posted by: Anonymous Coward on July 07, 2005 01:37 AM
No, it doesn't.

Partly, it depends on your mail server. For some reason, POP3 servers aren't set up automatically.

#

Dont want configure

Posted by: Anonymous Coward on July 09, 2005 01:56 AM
Sure some users enjoy to configure things to get it to work like they want.
But I think most users dont want to spend hours/nights reading how-to's/guides like this and try to configure it right for hours.
The possibility to fine-tune it is excellent, but I think most users want it to just work.

#

redundant filter

Posted by: Anonymous Coward on July 13, 2005 10:15 PM
Your 3 step filter is a little redundant:

# Move to Folder Spam
# Set Status Read
# Stop Processing

Once the message is moved, it is deleted from incoming or from the folder it was in, so setting its status to read is effectively a noop.

For those who care, in 2.4 (maybe 2.2.something, i can't remember), move does an implicit stop now too.


  - notzed

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya