Destress Your Mail Server with Postfix Troubleshooting Tips: Part 1
Several mail servers are available on the market today. Some are rather confounding to configure but are super-reliable. Others are a little old school and rely on no small degree of background knowledge to get them up and running.
According to the website Mail Radar, in a randomized survey of 59,209 servers, some 24 percent were running Sendmail, the parent of all popular MTAs (Mail Transfer Agents). This was followed by Postfix at 20 percent and qmail at 17 percent. Surprisingly, almost half (48 percent) of the IP addresses hosting these mail servers were registered to ranges in the United States.
Having used the mighty qmail, which is so efficient that it thwarted non-trivial Mail Bomb attacks in the late 1990s without breaking a sweat at an ISP that I worked for, I became a Postfix convert around the time that the most popular Linux distributions took a shine to its reliability and began including it as their default MTA.
To say that Postfix is brimming with features is an understatement. There are simply too many features to cover. This powerful mail server can process prodigious amounts of emails in a highly efficient manner and still pay attention to somewhat obscure, but very useful, config options.
The main Postfix manual spanned some 8,800 words (and there are several other manuals in addition) so it’s not surprising that one of the topics that Postfix pays attention to is getting the best possible performance out of your mail server.
In this series, I’ll look at a few tips that the Postfix documentation recommends to get the most out of your MTA build and process emails more efficiently. Not all scenarios will apply to your build of course, but -- as with all things -- understanding various scenarios should help improve your ability to troubleshoot in the future.
For clarity, when “clients” are mentioned, they are almost always other mail servers and not what some might consider as mail clients (i.e., the ones you pick up email with, such as Thunderbird or Evolution etc.).
There’s a welcome file bundled with Postfix called “DEBUG_README,” which cites the best routes to follow when solving an issue. It can be seen in Listing 1:
* Look for obvious signs of trouble
* Debugging Postfix from inside
* Try turning off chroot operation in master.cf
* Verbose logging for specific SMTP connections
* Record the SMTP session with a network sniffer
* Making Postfix daemon programs more verbose
* Manually tracing a Postfix daemon process
* Automatically tracing a Postfix daemon process
* Running daemon programs with the interactive ddd debugger
* Running daemon programs with the interactive gdb debugger
* Running daemon programs under a non-interactive debugger
* Unreasonable behavior
* Reporting problems to email@example.com
Listing 1: Starting from the top, here is how Postfix recommends you approach troubleshooting any issues that you encounter.
If you’re ever hit by a mountain-sized amount of email from a particular domain name and you haven’t set up your anti-spam filters correctly beforehand, then you can use a clever tool called qshape. Although this tool is apparently bundled in certain older versions, I couldn’t find it on my Postfix 2.6 installation, so this is the way forward if you need to install it on Red Hat derivatives and Debian derivatives, respectively:
# yum install postfix-perl-scripts # apt-get install postfix-perl-scripts
Any issues with running qshape once the package is installed might relate to the fact that Perl isn’t installed, so look into the relevant packages for your distribution if you encounter headaches. On Red Hat and its children, you can reportedly do that by running this command:
# yum groupinstall perl development
Now that we have a working installation of qshape, let’s see how it works. The ever-helpful Postfix docs discuss the fact that qshape offers a formatted table of your mail queue usage -- as seen in Listing 2 -- having run this command (for the “hold” queue, which I’ll look at in a moment):
# qshape -s hold | head T 5 10 20 40 80 160 320 640 1280 1280+ TOTAL 486 0 0 1 0 0 2 4 20 40 419 yahoo.com 14 0 0 1 0 0 0 0 1 0 12 extremepricecuts.net 13 0 0 0 0 0 0 0 2 0 11 ms35.hinet.net 12 0 0 0 0 0 0 0 0 1 11 winnersdaily.net 12 0 0 0 0 0 0 0 2 0 10 hotmail.com 11 0 0 0 0 0 0 0 0 1 10 worldnet.fr 6 0 0 0 0 0 0 0 0 0 6 ms41.hinet.net 6 0 0 0 0 0 0 0 0 0 6 osn.de 5 0 0 0 0 0 1 0 0 0 4
Listing 2: Results from the “hold” queue on Postfix using qshape.
In Listing 2, the top line showing "T" stands for the total “sender” count for each domain name. The manual explains that the “columns with numbers above them show counts for messages aged fewer than that many minutes, but not younger than the age limit for the previous column.” Then, the “TOTAL” count tallies up all domain names.
In Listing 2, there are 14 messages that may or may not have legitimately come from “yahoo.com.” One of those messages is between 10 and 20 minutes old, one is between 320 and 640 minutes old and 12 are older than 1,280 minutes, which is rapidly approaching a full day.
The insightful qshape tool should assist you in discerning which offenders are clogging up your mail server’s pipes. Once you’ve gleaned that information, you can then create filters or rules to help reduce the volume of emails from problematic senders.
In Listing 2, we used the “hold” queue. Postfix uses a number of different queues which offers exceptional flexibility and keeps it running efficiently. Table 1 shows what each is used for.
This queue receives incoming e-mail either via network interfaces or by the local mail “pickup” daemon via the “maildrop” directory.
When Postfix’s queue manager opens an email with a view to delivering it such an email sits in the active queue. This queue is limited in size for performance.
If an initial effort to deliver email fails to succeed then this queue picks up that email. The delays between attempted retries are cleverly increased over time.
Any emails that Postfix can’t read and thinks may be damaged or corrupt in some way are moved into this queue for inspection.
If you need to manually delay either individual or volume transmissions of certain emails for any reason, then you can set them to sit in the hold queue, essentially in jail, until you deem them publicly acceptable.
Table 1: The different queues which Postfix uses to assist with achieving the best efficiency.
If you look in the “/var/spool/postfix” directory, you should see something similar to that displayed in Figure 1 (potentially depending on your version and distribution).
An example of an unhealthy queue is also available in the Postfix docs.
The docs discuss a dictionary-based attack where multitudinous email addresses are emailed in the hope that they are valid. We check the “deferred” queue with this command as so:
# qshape -s deferred | head T 5 10 20 40 80 160 320 640 1280 1280+ TOTAL 2193 4 4 5 8 33 56 104 205 465 1309 MAILER-DAEMON 1709 4 4 5 8 33 55 101 198 452 849 example.com 263 0 0 0 0 0 0 0 0 2 261 example.org 209 0 0 0 0 0 1 3 6 11 188 example.net 6 0 0 0 0 0 0 0 0 0 6 example.edu 3 0 0 0 0 0 0 0 0 0 3 example.gov 2 0 0 0 0 0 0 0 1 0 1 example.mil 1 0 0 0 0 0 0 0 0 0 1
Listing 3: A mail queue after an attack that hunted for valid addresses to send spam to.
It’s important to note that a busy “deferred” mail queue is not always cause for alarm. This is especially the case if your “active” and “incoming” queues are relatively quiet so that mail deliveries can continue efficiently. In Listing 3, however, the “MAILER-DAEMON” entries display a large number of bounces, and some 849 emails have been sitting undelivered for some time.
As you can imagine, digging out such information from the logs is a far more difficult challenge. The qshape tool is a welcome addition to any admin’s Postfix toolkit and comes highly recommended for a quick glance at what your MTA is up to. The installation, if it’s not bundled with your version, is worth the effort.
Next, let’s spend a moment considering a smattering of other issues that might need looked at on an MTA.
One of the few times that you tinker with the “master.cf” file -- which is a relatively dangerous undertaking relative to other config files -- is when you introduce the running of Postfix inside a chrooted environment. In other words, it's when you run Postfix inside a limited environment so it has little chance of contaminating other parts of your system should if compromised.
The docs suggest that not managing to set up a chroot correctly is fairly common, so if you have problems, it should be one of the first things that you disable. Apparently, the master Postfix process won’t run chrooted thanks to the fact that it is needed to spawn the other processes. So, don’t let that confuse you if you can’t get a chroot working. I’m guessing that this “root” privilege might be required in the same way that Apache needs the “root” user to open a privileged TCP port (where the initial process runs as “root”), and from there many child processes run as the less-privileged users, such as “httpd” or “apache”.
In the past, I’ve seen an issue where the mail logs contain lots of entries mentioning “Unknown”. If you find yourself flummoxed by such errors, you can rest assured that they usually relate to DNS lookup problems. Check that the server’s local “resolv.conf” file holds valid name server information or if those servers are down or misbehaving. Essentially, this error means simply that no hostname resolved correctly so the machine name remains “unknown” to Postfix.
Before I look at how to pacify an unhappily busy Postfix instance -- one which is so stressed with load that it’s creaking at the seams -- I will first consider that (until recently at least) each and every SMTP connection to your MTA spawned an individual process. If your server is having a tantrum because it’s too busy, then you’re likely not getting the best out of your SMTP processes. And, there’s a chance that the limit of the number of processes allocated is too low and needs adjusted within the “master.cf” file. Let’s look further into this issue, albeit briefly.
Although this next recommendation only applies to deprecated versions in my experience, you can never predict when you might have an ancient version of a piece of software foisted upon you to fix. A somewhat older approach to machines that abused the services provided by a server was referred to as “tar-pitting.” This is where a delay was purposely introduced at the initial greeting stage. The idea was that because of this irritating delay the attraction of sending lots of spam was lessened (and less profitable). This was phased out in Postfix 2.0, but there is a useful setting to disable this “sleep” time. Again, this is not needed from Postfix 2.1 on but resides in “/etc/postfix/main.cf”:
This was phased out because other anti-spam measures were successfully employed, and because introducing lengthier sessions to every single email exchange that your server makes is, upon reflection, not always entirely helpful.
I mention this because a couple of clever options were spun off the basic premise of “sleeping.” I’ve used it (along with many other carefully considered options) so successfully in the past that allmost every piece of spam was caught before hitting my Inbox. Let’s look at the newer, more evolved options now.
First, the “smtpd_soft_error_limit” option allows us to set a limit of errors that an individual connection generates. This defaults at 10 errors. If that count is reached, then Postfix will delay any valid and invalid responses it sends by the value configured by the option “smtpd_error_sleep_time” in seconds (which is one second by default).
Having seen the name of the first option, you might have guessed that the second option refers to “hard” limits: “smtpd_hard_error_limit”. If the default limit of 20 is reached, then Postfix dutifully stops speaking to the remote server completely, throws down its toys, and storms off in a huff.
You’re advised to tweak these error limits to suit your preferences. Of course, it’s sensible to start off being less restrictive and slowly adjusting them (followed by reloading Postfix’s config each time). I have to admit that I opted for the other route (mainly because the mail server on which I tested the config changes only received a small amount of email traffic).
Knowing that I had half an hour to tune the aforementioned settings to my heart’s content, I introduced relatively draconian limits and immediately locked out all the pesky spam. Gingerly, I opened up some of the limits and began to see ham (legitimate emails) slowly return from their MTAs to retry their deliveries having been told to delay their initial attempt. After some trial and error (and only a few emails were ultimately delayed by 5 or 10 minutes), there was no spam to be seen.
In this article, I covered some ways to process emails more efficiently and discussed how to debug various MTA issues. In part two of this series, I’ll take a closer look at some warning signs and consider ways to mitigate issues when you see that your server is stressed.
Chris Binnie is a Technical Consultant with 20 years of Linux experience and a writer for Linux Magazine and Admin Magazine. His new book Linux Server Security: Hack and Defend teaches you how to launch sophisticated attacks, make your servers invisible and crack complex passwords.