June 14, 2007

Using RBL and DCC for spam protection

Author: Murthy Raju

I run a Postfix-based mail server that services a few hundred users with an average load of a couple of thousand legitimate messages a day -- but thanks to spam, the actual load on the server is much higher. I use Realtime Blackhole Lists (RBL) and Distributed Checksum Clearinghouse (DCC) clients on Postfix and SpamAssassin to reduce the impact of spam.

RBLs are lists of IP addresses of known and potential spam originators. There are many RBL providers, such as Spamhaus, Spamcop, and DNSRBL. These lists are also known by various other names, such as blacklists or blocking lists. Mail servers can connect to RBL servers to check on IP addresses.

RBL providers add IP addresses to their lists that fall into any of the following categories:

  • Known originator of spam
  • Open SMTP relay that can be misused by any mail server in the world to send spam
  • Dynamically assigned IP addresses of DSL or dial-up customers of ISPs who allow their machines to be used to originate spam
  • IP addresses of computers affected by mass mailing viruses or trojans

RBL providers may also take various other parameters into account for listing an IP address. Each RBL provider has its own strategy for gathering IP addresses. The process may involve actively checking large sets of IP addresses for potential listing.

An IP address that reaches an RBL won't stay there forever. Some providers have a specific time period after which IP addresses are automatically delisted. Some other providers delist on request from the owners of the affected IP addresses if they are convinced that the cause for delisting does not exist any more. Ease of delisting varies among providers; getting off a list can be a cumbersome task.

How does RBL system work?

When an IP address is identified for listing, the RBL provider sets up a corresponding record in its DNS database. For example, if the IP address were identified for inclusion in the RBL of Spamhaus, it would add a DNS record for the pseudo host This would make it easy for the clients to lookup the IP address using DNS protocol.

A variation on RBL is Right Hand Side Black List (RHSBL), which is a list of hostnames instead of IP addresses. These lists are useful in situations where the spammers may use machines with different IP addresses, but the same hostname. Generally, providers offer either RBL or RHSBL, but not both, so if you wanted to do RHSBL lookups as well, you would do multiple lookups.

An SMTP server that wants to check the status of an IP address or hostname makes a DNS query for the record of type txt for the suitably constructed hostname (example: If a record is returned, it contains the URL where further information on the listing is provided (Example: text = "http://www.spamhaus.org/query/bl?ip="). Using this information, the SMTP server can reject the connection with a suitable error message. If no record is returned, it indicates that there is no match on the list and the SMTP server can accept the connection.

On Postfix, you can setup RHSBL/RBL lookups by adding lines like the following to /etc/postfix/main.cf:

smtpd_recipient_restrictions =
			reject_rhsbl_client blackhole.securitysage.com,
                        reject_rhsbl_sender blackhole.securitysage.com,
                        reject_rbl_client zen.spamhaus.org,
                        reject_rbl_client bl.spamcop.net

With the above settings, the Postfix server looks up the hostname and IP address of incoming connections on four different RBLs, and if there is a match, drops the connection.

When I implemented the above code on my Postfix server, it started rejecting about 75% of all connections based on the RBL lookup results.

Though this looks as a boon to the mail server administrators, you have to be cautious and take note of a potential negative side of this approach. Since IP addresses can get into RBLs due to various reasons, there is a potential for an IP address to be listed unfairly. One such scenario is where an RBL provider decides to list an entire range instead of individual IP addresses. This may happen when a large number of IP addresses in a range satisfy the criteria for listing. Mail server administrators need to keep this possibility in mind and take a considered decision, based on their site policies, before deciding to filter at the level of SMTP connection based on RBLs alone.

RBL lookups using SpamAssassin

SpamAssassin is a widely used GPLed application that fights spam. SpamAssassin can evaluate an email message based on many local and network-based checks and assign a score to it. You can use this score to assess if a message is likely to be spam and take a suitable action, such as dropping it, labeling it as spam, or sending it to a different folder.

You can configure SpamAssassin to do RBL lookups as a part of its evaluation routine. This method can be less aggressive than that of dropping the SMTP connection for an RBL match alone.

To add RBL lookups to SpamAssassin's configuration file (/etc/mail/spamassassin/local.cf):

skip_rbl_checks  0  # This is the default and enables lookups
rbl_timeout 15 # Timeout for lookups in seconds

SpamAssassin also adds headers in to messages to indicate that the RBL lookup was positive. SpamAssassin labels email as spam by adding a tag like [SPAM] to the subject line. Users can filter such messages into a different folder on their IMAP server or in their mail clients.

Distributed Checksum Clearinghouse

Unfortunately for email administrators, spammers do not always use the same hosts to push their spam messages and may escape checks like RBL lookups. But SpamAssassin can also do a network-based lookup called Distributed Checksum Clearinghouse (DCC) that helps in identifying if a message is a known bulk mail in circulation.

DCC servers maintain a count of times a specific message has been looked up in the database. These servers do not store the entire message; instead, they use a checksum (or a set of checksums) for each message. The client generates a set of checksums for the message to be looked up and queries them on DCC servers. DCC servers return the number of times the given set of checksums were looked up. After a threshold number of queries, the occurrence count becomes "many." In DCC, the clients actively contribute to the data. Since only checksums are transferred, privacy is not compromised, and the overhead of lookups is minimal.

DCC can integrate into most anti-spam tools. You can configure SpamAssassin by editing local.cf to add the following lines:

use_dcc 1
dcc_home /var/dcc
dcc_path /usr/local/bin/dccproc
add_header all  DCC _DCCB_: _DCCR_

If you want messages from certain addresses or certain domains to bypass these checks, you can whitelist them by adding the following lines to local.cf:

whitelist_from somebody@example.com somebodyelse@example.com
whitelist_from *@example.com

Apart from RBLs and DCC, there are several other approaches to fighting the spam, such as looking for known patterns in the mail content, and analyzing peculiarities in mail headers.

Anti-spam configuration on a mail server or SpamAssassin cannot be a one-time measure. Administrators must be alert and willing to make constant course corrections to face new challenges.


  • System Administration
  • Mail & Messaging