Author: Brian Warshawsky
V. Thou shalt document complete and effective policies and procedures
In the past I found documented policies useful especially at two different times. The first is at the inception of a project. Before the system goes into production, sometimes even before the hardware is bought, detail in writing exactly what you need the server to accomplish, where its performance bottlenecks will be, and what your intentions are to correct these issues. This will allow you (and upper management!) to know that your time is not being spent chasing a fantasy implementation that will never work. It also helps you to better understand the nature of the beast you’re building. If anything goes wrong during the installation and configuration process (and something always does) you’ll be better prepared to deal with it simply due to the better understanding you’ve obtained by mapping everything out beforehand. At this point you don’t need anything more than an outline (sometimes in the form of a project plan) and a few diagrams to guide you. If it’s a much larger-scale implementation though, you’ll need a detailed project plan dividing the entire process into phases. For instance, a large-scale Beowulf cluster would require a detailed project plan, while a new intranet Web server might only require a brief outline of configuration tasks and a diagram showing how it’s integrated into network.
The second time that these policies are important is after the server has finished configuration and is ready to go into a production environment. At this point, before it is rolled out, you should take some time to create some detailed step-by-step documents explaining the backup restoration process, the steps necessary to restart a service (or just make a list of important services that might need to be restarted, depending upon the experience of your back admins) and anything else that might be helpful. Just remember that you won’t always be available to fix something; having detailed instructions for common problems or routine exercises can make the difference between 10 minutes of downtime and a week and a half if you are unavailable.
The commandments so far:
I. Thou shalt make regular and complete backups
II. Thou shalt establish absolute trust in thy servers
III. Thou shalt be the first to know when something goes down
IV. Thou shalt keep server logs on everything
V. Thou shalt document complete and effective policies and procedures