The Third Commandment of system administration

50

Author: Brian Warshawsky

Nothing will make your team look worse than getting a call from an angry boss wondering why he couldn’t get into the company email server all weekend. Server and service monitoring provides a way to know at all times which items on your network are up, which are down, what servers are experiencing issues handling load and just about any other type of availability information you could want. By configuring alerts and notifications, you can be the first to know when a problem on your network arises.

III. Thou shalt be the first to know when something goes down

Software options for service availability monitoring on the free and open source side include Big Sister, Nagios, and Zabbix among others. While all three of these offer a wide range of abilities and are all excellent pieces of software, the one we will focus on here is Zabbix.

Zabbix is a Web-based monitoring tool, which makes configuration and maintenance simple and efficient. It provides the ability to monitor any SNMP-enabled device or application, as well as any server running AIX, *BSD, HP-UX, Linux, Mac OS X, Solaris, Tru64/OSF, or Microsoft Windows. Zabbix itself will run on any server running any of those operating systems with the exception of Windows.

All of this is great, but the real usefulness of Zabbix lies with its ability to send alerts when an issue arises, via email, SMS messages, or pager messages, depending on how you have it configured. Zabbix supports user-defined triggers, meaning you can tell it to watch the CPU utilization and disk space on a server, and should a problem arise, Zabbix will alert you and let you know right away, instead of finding out Monday morning. Future versions of Zabbix will support alert escalations, so that if the first person in line doesn’t answer the alert to let Zabbix know someone is handling it, it will page the next person in line. However, for right now the functionality is limited to one page per outage, which, if you are as religiously devoted to a cell phone and pager as I am, is not an issue. I have notifications configured as emails to my two-way pager. I’ve also had success in sending alert messages to my cell phone as email messages, and receiving them there as text messages. Check with your provider to see what the email address for your phone is.

Alerts and pages might be a little annoying at 3 a.m., but they can be great indicators of a larger problem about to occur. I had one instance where devices behind a router kept timing out and dropping packets, though the router itself never missed a ping. This was an indicator that clued me into the fact that one of the router interfaces was about to die, and allowed me to replace it before a complete crash and outage. Without Zabbix monitoring that interface, the problem would have been detected as slowness on that part of the network initially, which can be much more vague and problematic to trace.

Installation of Zabbix is fairly straightforward, though there are some dependencies to note. You will need Apache, PHP 4, and either MySQL or PostgreSQL. It is good practice to make sure you use the latest versions of these packages that you can, as Zabbix is not recommended for use with some older versions of these programs. You will also need the PHP-GD and Net-SNMP development packages. These are all fairly common packages, and should be freely available for your distribution. Complete details on the installation and configuration of Zabbix can be found in the documentation that comes with the source code or on the Web. The Web site also runs a forum where you can compare notes with other Zabbix users and ask for help should something stump you in the configuration or setup of the program.

Zabbix offers a wealth of other options, such as advanced graphing features, network mapping, and customer interfaces for quick reference on the health of a network as a whole. Anyone who has been caught unaware of a network or server problem should look into Zabbix as a monitoring solution.

The commandments so far:
I. Thou shalt make regular and complete backups
II. Thou shalt establish absolute trust in thy servers
III. Thou shalt be the first to know when something goes down