The Ten Commandments of system administration, part I


Author: Brian Warshawsky

As system and network administrators, we play many roles. We are the ones who provide stable and secure environments for electronic business in all of its forms, from email to accounting systems to mission-critical Web applications. However, despite our best efforts, disaster will occasionally strike. In this series, I’ll present what I feel are the 10 most important steps a systems administrator can take to ensure that when that dreaded 3 a.m. page hits, you’re prepared to react quickly, assess the situation, and make everything right again. I call these the Ten Commandments of system administration.

You know the situation. You might be in the middle of rebooting after something as benign as adding RAM to your server, when all of a sudden your root filesystem refuses to mount, and fsck isn’t helping. Or perhaps a hard disk fails and six months of work on a project goes with it. The probability of these and other such occurrences taking place is often small, but when they actually occur, you’ll be glad you’ve prepared ahead of time.

Many of these precautions may seem elementary or obvious to you, but too often our job as system administrators doesn’t give us the time to properly plan or fully configure servers and network infrastructure for deployment, and as a result, things we meant to go back and take care of later slip through the cracks.

It is not my intention to provide you with step by step instructions on each of these topics, but merely to suggest what has worked for me, and point you in the right direction. As the administrator at your site, I invite you to evaluate your own situation and environment to determine the proper course of action for your organization, and share your tips with readers through your comments below.

I. Thou shalt make regular and complete backups

This is probably the most basic fundamental of system administration, and that’s why I’ve chosen to mention it first. Managing backups is one of the most important roles you’ll undertake as a system administrator. As any well-seasoned IT professional can tell you, nothing matches the wrath and fury of a manager who has just lost all his email. Email isn’t the only thing that requires backups, of course. Databases are becoming increasingly important in driving applications of all types, ranging from Web apps to customer and billing records. All of these represent data that is essential to the continued functionality of an organization, and that’s what makes backups so important.

There are many ways you can implement backups in your organization, and what works best for me will not necessarily work best for you. Having said that, here are two ways I go about backing up data for my organization.

Since I administer a variety of servers running a variety of operating systems, I’ve implemented two separate backup routines. The first involves an enterprise software package called Backup Exec by Veritas, which I use for Windows machines. The second solution is by using rsync to create mirrored directories on remote RAID 5 hosts. Rsync is fairly flexible as a backup tool, and what I will focus on here.

The power of rsync lies in its ability to do incremental backups of large filesets without the network and disk overhead of more traditional methods such as tar and ufsdump/ufsrestore. An added bonus is that rsync can work on a live filesystem, though you should try to schedule backups for times that see the lowest utilization and traffic.

My setup is as follows. I have a Linux server that is set up in a RAID 5 configuration that serves as both a backup server and a remote syslog server. I have three other production *nix servers that use rsync each night to create an incremental backup of important files and directories across the network to the Linux machine. I run and manage the backups from each server for the sake of ensuring I don’t ever accidentally run backups on the wrong part of a server. Here’s an example of the rsync command that runs every night to copy data across to the Linux server:

rsync -e ssh -avz --delete /srv/www/htdocs/ sphinx:/backups/webserver

The -e ssh flag tells rsync to use SSH as its transport method. This is a convenient way of encrypting your data transfer to keep everything safe. The -a flag instructs rsync to operate in archival mode, which keeps the file permissions, modification dates, and other attributes identical to the original files. The -z flag tells rsync to compress the data for faster transfers.

The --delete option instructs rsync to remove any files from the destination that are no longer found in the source. It’s important that you read and fully understand the consequences of this flag, as adding or forgetting altogether the trailing slash on your source or destination directories can lead to traumatic consequences and data loss. Make sure you have a firm understanding of the various uses of rsync before deploying it in a production environment, as data losses resulting from a mistyped slash and the --delete option are easier to fall into than you might think.

Finally, we’re left with /srv/www/htdocs, which is the source directory from which everything will be copied, and sphinx:/backups/webserver, the host name and path to the backup directory.

When you’re designing a backup policy around rsync, set up trusted SSH sessions ahead of time, so you don’t have to include a password within shell scripts or have to repeatedly re-enter it. Also, schedule regular backups with cron so as to prevent the machine from needing constant input from you to archive your data. If you’d like a look at an extremely well designed rsync backup policy, you can find one here.

Next week: The Second Commandment