System administrators need to keep an eye on their servers to make sure things are running smoothly. If they find a problem, they need to see when it started, so investigations can focus on what happened at that time. That means logging information at regular intervals and having a quick way to analyse this data. Here's a look at several tools that let you monitor one or more servers from a Web interface.
The Munin project is fairly clearly split into the gather and analyse functionality. This lets you install just the package to gather information on many servers and have a single central server to analyse all the gathered information. Munin is also widely packaged, making setup and updates fairly simple.
Munin is written in Perl, ships with a collection of plugins, supports many versions of Unix operating systems, and has a searchable plugin site.
Munin is in the Fedora 9 repository, Debian Etch, and as a 1-Click for openSUSE 11. Again, I used the version from the repository for a 64-bit Fedora 9 machine. The project provides two packages: the munin-node package includes all of the monitoring functionality, and the munin package supports gathering information from machines running munin-node and graphing it via a Web interface. If you have a network of machines, you probably only want to install munin-node on most of them and munin on one to perform analysis on all the collected data.
The main configuration file for munin-node, /etc/munin/munin-node.conf, lets you define where log files are kept, what user to run the monitoring daemon as, what address and port the daemon should bind to, and which hosts are allowed to connect to that address and port to download the collected data. In the default configuration only localhost is allowed to connect to a munin-node.
You configure plugins through individual configuration files in /etc/munin/plugin-conf.d. Munin-node for Fedora is distributed with about three dozen plugins for monitoring a wide range of system and device information.
When you visit http://localhost/munin, Munin displays an overview page showing you links to all the nodes that it knows about and including links to specific features of the nodes, such as disk, network, NFS, and processes. Clicking on a node name shows you a two-column display. Each row shows a graph with the daily statistics on the left and weekly on the right. Clicking on either graph in a row takes you to a details page showing that data for the day, week, month, and year. At the bottom of the details page a short textual display gives more details about the data, including notification of irregular events. For example, in the below screenshot of the details page for free disk space, you can see a warning that one of the filesystems has become quite full.
To get an idea of how well a system or service is running on a daily or weekly basis, the Munin display works well. Unfortunately, the Web interface does not allow you to drill into the data. For example, you might like to see a specific two-hour period from yesterday, but you can't get that graph from Munin.
The plugins site for Munin is quite well done, allowing you to see an example graph for many of the plugins before downloading. A drawback to the plugins site is the search interface, which is very category-oriented. Some full text search would aid users in finding an appropriate plugin. For example, to find the NUT UPS monitoring plugin you have to select either Sensors or "ALL CATEGORIES" first; just being able to throw UPS or NUT into a text box would enable quick cherry-picking of plugins.
A major advantage of Munin is that it ships as separate packages for gathering and analysing information, so you don't need to install a Web server on each node. The additional information at the bottom of the details page should also prod you if some statistic has a value that you should really pay attention too.
Of these four applications, Cacti offers the best Web interface, letting you select the time interval displayed on your graphs from more predefined settings, and it also lets you explicitly nominate the start and end time you are interested in. By contrast, in collectd, the emphasis is squarely on monitoring your systems, and the provided Web interface is offered purely as an example that might be of interest. Given that collecting and analysing data can be thought of as separate tasks, it would be wonderful if collectd and Cacti could play well together. Unfortunately, setting up Cacti to use collectd-generated files is a long, manual, error-prone process. While both Cacti and collectd are useful projects by themselves, I can't help but think that the combination of the two would be greater than its parts. Monitorix and Munin are easy to install and offer a quick overview of a host, but Monitorix's three graph per row aggregate presentation gives you a better high-level view of your system.
Which one might be best for you? If you spend a lot of time in data analysis or if you plan to allow non-administrators to get a glance of the system statistics, Cacti might be a good project to look into first. If you want to gather information on a system that is already under heavy load, see if collectd can be run without disrupting your system. Munin's support for gathering information from many nodes using different application packages makes it interesting if you are monitoring a small group of similar machines. If you have a single server and want a quick overview of what is happening, either Cacti or Monitorix is worth checking out first.
Ben Martin has been working on filesystems for more than 10 years. He completed his Ph.D. and now offers consulting services focused on libferris, filesystems, and search solutions.
Note: Comments are owned by the poster. We are not responsible for their content.
Four winning ways to monitor machines through Web interfaces
Posted by: Anonymous [ip: 24.130.151.91] on November 04, 2008 04:50 PM#