May 16, 2005

The Fourth Commandment of system administration

Author: Brian Warshawsky

The role of system administrator is a role of details. Heavily used and updated servers are filled with details, from new tables in a database to root password changes. These details need to be documented. When you are managing three servers, these details can be easy enough to remember. However, when you have 30 or 50 or 100 servers, the details become impossible to keep track of without documenting them. When it matters, you don't want to think that the IP address of that old accounting server is 192.168.10.55, you want to know it.

IV. Thou shalt keep server logs on everything

The only way to ensure that you know the details of all your servers all of the time is to write them down! At my current job, I take care of this using a simple OpenOffice.org spreadsheet. I have four racks of servers, so I use one sheet per rack to track hardware platforms, IP addresses, DNS names, dates of last update and patch installs, operating system, running services, open ports, and other aspects of the machines. It took a day to set it up and get it right, and I update it as I touch servers for various reasons.

This works fine for broad topics, but there are often changes on more individual scale that don't work so well in a spreadsheet. To address those changes, I have started adding a text file in my home directory on each server to list changes such as user accounts that have been created, log inconsistencies, and other details that are important to remember.

In order to not lose the file if I lose the server unexpectedly, whenever I make serious changes to a server, I set my SSH client to log the session, thus capturing both the changes to the log files as well as the actual commands I issue during the session. I then copy these files from my laptop to my desktop in the datacenter as well as to a machine at home. Redundancy here means not only can I access the notes from each machine wherever I am, but I can also be assured that even if I lose two of the machines, I still have the server logs.

This kind of redundancy might seem extreme, but after getting a page about a downed service (remember the Third Commandment?) at 3 o'clock in the morning, I'm not inclined to have to search for such details. Besides, for reasons we'll get into later, syncing the log files doesn't take much effort at all.

The commandments so far:
I. Thou shalt make regular and complete backups
II. Thou shalt establish absolute trust in thy servers
III. Thou shalt be the first to know when something goes down
IV. Thou shalt keep server logs on everything

Click Here!