System administrators need to keep an eye on their servers to make sure things are running smoothly. If they find a problem, they need to see when it started, so investigations can focus on what happened at that time. That means logging information at regular intervals and having a quick way to analyse this data. Here's a look at several tools that let you monitor one or more servers from a Web interface.
Each tool has a slightly different focus, so we'll take them all for a spin to give you an idea of which you might like to install on your machine. The language and design they use to perform statistic logging can have an impact on efficiency. For example, collectd is written in C and runs as a daemon, so it does not have to create any new processes in order to gather system information. Other collection programs might be written in Perl and be regularly spawned from cron. While your disk cache will likely contain the Perl interpreter and any Perl modules used by the collection program, the system will need to spawn one or more new processes regularly to gather the system information.
The tools we'll take a look at often use other tools themselves, such as RRDtool, which includes tools to store time series data and graph it.
The RRDtool project focuses on making storing new time series data have a low impact on the disk subsystem. You might at first think this is no big deal. If you are just capturing a few values every 5-10 seconds, appending them to the end of a file shouldn't even be noticeable on a server. Then you may start monitoring the CPU (load, nice values, each core -- maybe as many as 16 individual values), memory (swap, cache sizes -- perhaps another five), the free space on your disks (20 values?) and a collection of metrics from your UPS (perhaps the most interesting 10 values). Even without considering network traffic, you can be logging 50 values from your system every 10 seconds.
RRDtool is designed to write these values to disk in 4KB blocks instead of as little 4- or 8-byte writes, so the system does not have to perform work every logging interval. If another tool wants all the data, it can issue a flush, ensuring that all the data RRDtool might be caching is stored to disk. RRDtool is also smart about telling the Linux kernel what caching policy to use on its files. Since data is written 4KB at a time, it doesn't make sense to keep it around in caches, as it will be needed again only if you perform analysis, perhaps using the RRDtool graph command.
As system monitoring tools write to files often, you might like to optimize where those files are stored. On conventional spinning disks, filesystem seeks are more expensive than sequentially reading many blocks, so when the Linux kernel needs to read a block it also reads some of the subsequent blocks into cache just in case the application wants them later. Because RRDtool files are written often, and usually only in small chunks the size of single disk blocks, you might do better turning off readahead for the partition you are storing your RRDfiles on. You can minimize the block readahead value for a device to two disk blocks with the blockdev program from util-linux-ng with a command like blockdev --setra 16 /dev/sdX. Turning off a-time updates and using writeback mode for the filesystem and RAID will also help performance. RRDtool provides advice for tuning your system for RRD.
The collectd project is designed for repeatedly collecting information about your systems. While the tarball includes a Web interface to analyse this information, the project says this interface is a minimal example and cites other projects such as Cacti for those who are looking for a Web interface to the information gathered by collectd.
There are collectd packages for Debian Etch, Fedora 9, and as a 1-Click for openSUSE. The application is written in C and is designed to run as a daemon, making it a low overhead application that is capable of logging information at short intervals without significant impact on the system.
When you are installing collectd packages you might like to investigate the plugin packages too. One of the major strengths of collectd is support through plugins for monitoring a wide variety of information about systems, such as databases, UPSes, general system parameters, and NFS and other server performance. Unfortunately the plugins pose a challenge to the packagers. For openSUSE you can simply install the plugins-all package. The version (4.4.x) packaged for Fedora 9 is too old to include the PostgreSQL plugin. The Network UPS Tools (NUT) plugin is not packaged for Debian, openSUSE, or Fedora.
The simplest way to resolve this for now is to build collectd from source, configuring exactly the plugins that you are interested in. Some of the plugins that are not commonly packaged at the time of writing but that you may wish to use include NUT, netlink, postgresql, and iptables. Installation of collectd follows the normal ./configure; make; sudo make install process, but your configure line will likely be very long if you specify which plugins you want compiled. The installation procedure and plugins I selected are shown in the below command block. I used the init.d startup file provided in the contrib directory and had to change a few of the paths because of the private prefix I used to keep the collectd installation under a single directory tree. Note that I had to also build a private copy of iproute2 in order to get the libnetlink library on Fedora 9.
$ cd ./collectd-4.5.1 $ ./configure --prefix=/usr/local/collectd \ --with-perl-bindings=INSTALLDIRS=vendor \ --without-libiptc --disable-ascent --disable-static \ --enable-postgresql --enable-mysql --enable-sensors \ --enable-email --enable-apache --enable-perl \ --enable-unixsock --enable-ipmi --enable-cpu --enable-nut \ --enable-xmms --enable-notify_email --enable-notify_desktop \ --disable-ipmi --with-libnetlink=/usr/local/collectd/iproute2-2.6.26 $ make ... $ sudo make install $ su # install -m 700 contrib/fedora/init.d-collectd /etc/init.d/collectd # vi /etc/init.d/collectd ... CONFIG=/usr/local/collectd/etc/collectd.conf ... daemon /usr/local/collectd/sbin/collectd -C "$CONFIG" # chkconfig collectd on
If more of the optional collectd plugins are packaged in the future, you may be able to install your desired build of collectd without having to resort to building from source.
Before you start collectd, take a look at etc/collectd.conf and make sure you like the list of plugins that are enabled and their options. The configuration file defines a handful of global options followed by some LoadPlugin lines that nominate which plugins you want collectd to use. Configuration of each plugin is done inside a <Plugin foo>...</Plugin> scope. You should also check in your configuration file that the rrdtool plugin is enabled and that the DataDir parameter is set to a directory that exists and which you are happy to have variable data stored in.
When you have given a brief look at the plugins that are enabled and their options, you can start collectd by running service collectd status as root.
To see the information collectd gathers you have to install the Web interface or another project such as Cacti. The below steps should install the basic CGI script that is supplied with collectd. The screenshot that follows the steps shows the script in action.
# yum install rrdtool-perl # cp contrib/collection.conf /etc/ # vi /etc/collection.conf datadir: "/var/lib/collectd/rrd/" libdir: "/usr/local/collectd/lib/collectd/" # cp collection.cgi /var/www/cgi-bin/ # chgrp apache /var/www/cgi-bin/collection.cgi
If you want to view your collectd data on a KDE desktop, check out kcollectd, a young but already useful project. You can also integrate the generated RRDtool files with Cacti, though the setup process is very longwinded for each graph.
With collectd the emphasis is squarely on monitoring your systems, and the provided Web interface is offered purely as an example that might be of interest. Being a daemon written in C, collectd can also run with minimal overhead on the system.
Next: Cacti
Note: Comments are owned by the poster. We are not responsible for their content.
Four winning ways to monitor machines through Web interfaces
Posted by: Anonymous [ip: 24.130.151.91] on November 04, 2008 04:50 PM#