June 8, 2004

Linux system backup for Windows network admins

Author: Michael Jang

What you do to back up your computers depends on the value of your data and how far you're willing to go. You can hold users responsible for their own backups. Alternatively, you can create and store backups on floppy disks, CDs, tape drives, and so on. You can back up part or all of the data, even in real time.

This article is excerpted from the recently published book Linux Transfer for Windows Network Admins.

You can configure a Redundant Array of Independent or Inexpensive Disks (RAID) to help you protect your data in real time. You can store important data on portable, writable media such as floppy disks and writable CDs. You can create and store backup data on remote computers on your network. You can even store the computers or portable media in remote physical locations.

The commands described in this article are administrative; unless otherwise noted, I'm assuming that you've logged in to your Linux computer as the administrative root user.

Backup strategies and types

What you do to back up the computers on your network depends on the time and money you have available. If you're administering a time- and mission-critical network, you may want to set up a very high-speed connection to a remote site to replicate your data in real time. If your data is not as valuable, a weekly or even monthly backup may be sufficient. In addition, I'll show you two RAID arrays that can preserve your data on the local computer in real time.

In this article, I assume that you want to back up the contents of the /home directory. While it can help to back up the contents of other directories, /home by and large contains the data from your users.

In this basic scenario, if you lose a file server and need to restore it from scratch, you would take the following basic steps:

  1. Fix the hardware. This may mean replacing the hard drive or configuring a new computer.
  2. Reinstall the operating system on the computer.
  3. Reinstall any updates to the operating system. If it's Red Hat Linux, you'll need to use the Red Hat Update Agent.
  4. Reinstall any applications that you've added to the operating system-over and beyond any applications that you may have included when you installed Red Hat Linux (such as the OpenOffice.org suite).
  5. Restore the data that you've backed up, presumably from the /home directory.

Backup strategies

The most straightforward backup involves copying all data on your hard drives. But hard drives now contain gigabytes and even terabytes of data. If you want to back up every byte of data on a larger hard drive, that can take hours. If you want to back up your hard drive on a remote computer, that can create the kind of traffic that could effectively shut down the network for your users.

Backups by system administrators are generally driven by four factors:

  • Timing. You want to schedule backups during the hours when your users don't need your file server or network.
  • Need. What you back up depends on the importance and timeliness of the data.
  • Cost. Backing up everything on your hard drives will drive up the cost of backup media, as well as the demands on your network.
  • Size. Larger backups are more difficult to manage. Depending on the speed of your network and the size of your hard drives, it may not be possible to completely back up your computer overnight, or even over a weekend.

Most administrators use a combination of backup strategies. The files associated with the Linux operating system don't change unless you've used the Red Hat Update Agent, or have otherwise downloaded and installed new RPM packages. Therefore, you may not need to back up all of the files on your computer on a daily basis.

When setting up a backup, many administrators will back up everything on a hard drive weekly or monthly. If a hard drive is too large, they may limit the backup to all user files, which on a Linux computer can be found under the /home directory. They will then back up newly created files more often.

RAID

Backing up and restoring data takes time. If you have a data disaster, your users may have to wait until you can restore their data from backups. There are alternatives. Several different types of RAID arrays can save data in real time. Your users can still access their data even if a single hard drive fails.

You can set up both hardware and software RAID arrays. A hardware RAID array consists of different physical hard drives. For that reason, the failure of a single hard drive does not cause you to lose all the data in that array. A software RAID array consists of different partitions. While you can configure a software RAID array with partitions on the same physical hard drive, I don't recommend it. If you do, any problem with that physical hard drive can lead to the loss of all your data on that array.

Red Hat Linux supports three basic types of software RAID arrays, known as RAID 0, RAID 1, and RAID 5. An array is a group of hard drives or partitions configured together. A RAID 0 array is used to combine two or more hard drives or partitions in a single volume. While it is faster, it does not help to protect data. A RAID 1 array is used to configure two hard drives or partitions with identical data. A RAID 5 array is used to configure three or more hard drives or partitions, in a way that also protects your data.

Configuring RAID on Red Hat Linux

You can set up a RAID array during the installation process. With the Red Hat Linux installer, you can set up a software RAID array on different partitions.

If you want to create or reconfigure a RAID array after installing Red Hat Linux, you'll need skills that are beyond the scope of this book. Because it is an important skill for a system administrator, I'll just outline the required steps here. A full description of the required steps is available in my book Mastering Red Hat Linux 9, published by Sybex.

  1. Install one or more new physical hard drives.
  2. After booting Linux, configure the new physical hard drives in a partition. Make sure that the size of the partition is about equal to the other partitions in the array. You can use either fdisk or parted for this purpose. Both utilities are located in the /sbin directory. I recommend fdisk as being more reliable. In fdisk, make sure to set the partition type to fd, which corresponds to a "Linux raid auto" type. If you use fdisk or parted, be very careful. A small mistake with either utility can easily destroy all of the data on any partition or hard drive on your computer.
  3. Format the partitions that you've just created. For example, if you've created the /dev/sde1 and /dev/sdf1 partitions to expand the size of the array, you'll need to run the following mkfs commands to format those partitions:

    # mkfs -j /dev/sde1
    # mkfs -j /dev/sdf1
  4. Configure the partitions in a RAID array. If you've already installed a software RAID array, all you need to do is modify the /etc/raidtab configuration file. Red Hat Linux 9 includes a sample RAID configuration file in raidtab.sample, located in the /usr/share/doc/raidtools-1.00.3 directory. You can edit this sample file and save it as /etc/raidtab.

  5. Create the RAID device. Typical RAID devices are based on file names such as /dev/md0 and /dev/md1. For example, if you're creating the third RAID device on your system, you'd run the following command:

    # mkraid -R /dev/md2

  6. Format the RAID device. This part is fairly straightforward, because it's analogous to the previous mkfs commands used to format individual partitions:

    # mkfs -j /dev/md2

  7. Now that you've created and formatted a RAID array, you're ready to mount the directory of your choice on that array. For example, if you've added partitions to increase the size of a RAID array for the /home directory, you could run the following commands. They copy the contents of the /home directory to the /tmp directory, after which you could mount the new RAID array.

    # cp -ar /home /tmp
    # mount /dev/md2 /home
  8. Now you can copy the files back from /tmp to the original location.

    # cp -ar /tmp/home /

  9. Finally, you can set up the new /home directory in your /etc/fstab configuration file. Details of this process are also beyond the scope of this book. The next time you boot into Linux, it's necessary to make sure that Linux boots your /home directory on the appropriate RAID array.

Backup media options

Choosing the media that you might use to back up your data depends on the amount of data that you want to keep safe. Individual users who work with text files may need only the space associated with a 1.44MB floppy disk. The same may hold true if you're backing up critical configuration files from the /etc directory, such as inittab, fstab, grub.conf, and the files in the /etc/samba subdirectory. (The /etc/grub.conf file is actually linked to the /boot/grub/grub.conf file. You can open and edit either file name; the result is saved in /boot/grub/grub.conf.)

A number of other types of media are available that you can use with a Linux computer. The variations are endless; I cite only the "typical" size for each media type:

Zip drives (from Iomega) are normally 100MB and are suitable for backing up other critical files such as the Linux kernel from the /boot directory. The standard Red Hat Linux installation allocates 100MB to the /boot directory.Bernoulli drives normally include 230MB of space, which might be enough to back up user files from the /home directory on some smaller networks.Writable CD media can store around 650MB of data, which can contain all but the data files for many dedicated Red Hat Linux servers where the GUI is not installed.Writable DVD media vary in size. They can contain 4.7GB to 17GB of data, which can be used to back up many Linux servers, including data files.Tape drives are available in a variety of sizes. As of this writing, I've seen single tape drives that can contain up to 300GB of data.

The media that you select depends on the money you have available, as well as your backup hardware. You also need to consider your specific environment. For example, if the only place where you can store your backups includes a number of magnetic fields (such as a factory or a machine shop), you may want to stick with CDs or DVDs.

System backups

There are a number of ways to back up larger file servers. The most direct way is to use third-party hardware to back up all the data on the server to some high-capacity media. You may also want to back up the hardware associated with your file server. This is possible courtesy of two different open-source projects: the High Availability Linux project and the Linux Virtual Server project.

Third-party hardware

If the hardware on your system, such as a CD or a DVD writer, is not enough, there are a number of third-party options available. For example, some systems can combine dozens of writable CDs, DVDs, and tape drives to save the data from every computer on your network.

A wide variety of hardware options are available. I use the data from www.storagesearch.com in my research. This site is published by Applied Computer Science, Ltd. of the United Kingdom. In addition to those previously described, some of the backup hardware types include:

FireWire: One common option is to back up personal computers onto a portable high-capacity hard drive. This type of drive uses connections that correspond to the IEEE1394 standard, also known as iLink. This is essentially a high-speed SCSI connection, which I use to back up data at burst speeds of nearly 400 Mbps. Other types of FireWire drives can write data to other media such as writable DVDs. (Linux support for FireWire as well as the alternative USB 2.0 devices is experimental as of this writing.)Removable hard drives: If you've configured a RAID 1 array of removable hard drives, you can remove one of the drives in the array and store that drive in some secure remote location. If you have a spare hard drive in that array, a RAID 1 system automatically writes your data to that drive. You can then set up a new hard drive as a spare. Larger groups of removable hard drives are often configured as rack-mount storage.Jukeboxes: These are "storage cabinets" that allow you to combine the data capacity of a group of media types, such as writable tape, DVDs, and CDs.Storage area networks (SAN): A high-speed network of storage devices; many are in common use for higher-capacity Linux systems. While this isn't explicitly a backup option, it is a common way to organize a RAID system.

High Availability Linux project

The High Availability Linux project is one way to set up multiple computers as a cluster. The cluster of computers appears like a single server to the other computers on the network. They often share a common external storage medium, usually with a SCSI connection.

If one computer in the cluster fails, another computer takes over automatically. Client computers on the network don't know which computer in the cluster is working as the file server. Because the computers in the cluster share the same external storage, clients see no difference in the data.

While this does not back up your data, it does serve as a backup for your hardware. For example, if you set up a PDC in a cluster, the failure of one computer does not affect your network. .

Linux Virtual Server project

The Linux Virtual Server project is another way to set up multiple computers as a cluster. This cluster of computers also appears like a single server to the other computers on the network. It also can be configured with a common external storage medium. One additional feature supports load balancing. In other words, if a lot of clients are working through one server in the cluster, load balancing sends additional clients to less busy servers in the cluster.

As with the High Availability Linux project's heartbeat software, if one computer fails, other computers take over automatically. The cluster of computers appears as one file server to the other computers on the network.

Red Hat Enterprise Linux (RHEL) Advanced Server supports its own version of the Virtual Server project. Red Hat developed its Enterprise Linux distributions from older versions of Red Hat Linux for higher capacity systems. As of when the book from which this article is taken was written, the latest version of RHEL Advanced Server was 2.1, which is based on Red Hat Linux 7.2. RHEL 3 should be available by now. RHEL 3 is based on the primary operating system used in this book, Red Hat Linux 9.

Backup and restore commands

As of this writing, Red Hat Linux does not include any GUI tools for backing up or restoring data. However, the Nautilus file browser does support writing to a writable CD. In general, to perform backups, you'll need to work from the command-line interface. Text commands support scheduled backups through the cron daemon. Fortunately, the commands are not difficult. Once you've configured connections to remote computers, you can back up any directories shared from those computers.

Let's start by looking at some basic backup commands. You can create archives with the tar or cpio commands. You can dump and restore data to and from a tape drive. And you can record data to writable CDs and DVDs by using the cdrecord and dvdrecord commands.

In most cases, Linux backups involve a two-step process. First you need to create a file, such as an archive of the files or directory that you want to save. Then you can save the archive to media such as a tape drive or a writable CD.

No matter what method or media you use for backup, make sure you can restore it. The small amount of time you spend testing your backup media can save you a lot of frustration down the road.

tar archives

The tar command archives and records a group of files, usually the contents of a directory. With the right switches, you can collect a group of files into single compressed archive. In that way, the tar command is analogous to the WinZip utilities associated with Microsoft Windows.

Users can run the tar command to archive the files in their own directories. For example, if user waymon wanted to back up the files in his home directory to waymon.tar.gz, he could use the following command:

$ tar cvzf waymon.tar.gz /home/waymon

You may notice something a little strange with the tar command; it does not require a dash in front of the switches. Several Linux commands work this way.

There are four switches associated with this particular command. It creates (c) the backup in the noted archive file (f), waymon.tar.gz. It runs the command in a verbose (v) way, which lists the files that are being collected in the archive. It then compresses the result (z), which reduces the space taken by the archive.

You may notice that tar archives are associated with two consecutive file extensions. The ".tar" indicates that it was created by the tar command. The ".gz" is associated with a compressed file.

Finally, it's important to cite the absolute path to the directory that you want to archive. Absolute paths start with the slash, /. Linux reads this as starting from the top-level root directory (/). You can restore this archive from any location in the Linux directory tree; an archive created from an absolute path is automatically restored to the same location in the Linux directory filesystem hierarchy.

Naturally, you can reverse the process. The following command restores from the compressed waymon.tar.gz archive:

$ tar tkvzf waymon.tar.gz

This command lists (t) the files in the archive. It does not overwrite (k) any existing files. It works in verbose (v) mode so you can watch as tar restores from your archive. It works from a compressed (z) archive, from the (f) file cited, waymon.tar.gz.

The tar command is substantially more complex; for more information, open the tar manual from the command-line interface with the man tar command.

cpio archives

The cpio command can help you archive a group of files. The group can be in a single directory, or it can include all files with a single pattern of alphanumeric characters. The cpio command is almost literal, because it copies (cp) from input to output.

The find command lists the files on your system, based on a search term. For example, the following command searches for all of the files on your computer with a JPG extension.

# find / -name *.jpg

You can now use the cpio command to collect these files. The following command combines find and cpio. The output from the find command is "piped" as input to the cpio command. The pipe is the vertical line next to or just under the backspace key on a standard U.S. keyboard:

# find / -name *.jpg | cpio

Linux's bash shell includes a number of characters that you can use to connect the input and output of different commands. The use of the following characters requires an understanding of standard input, standard output, and standard error: >, >>, |, 2>, <. This topic is beyond the scope of this book. You can find out more in my other book, Mastering Red Hat Linux 9.

But this command is not complete. You have input from the find command, but you need a place for the output. The following command sends that output (-o) to the jpegs.cpio archive:

# find / -name *.jpg | cpio -o > jpegs.cpio

Alternatively, if you have a tape drive connected to your computer, you can send this output directly to that tape drive. Normally, the first tape drive on a Linux computer is associated with the /dev/st0 device file. Thus, you can archive all JPG files on your computer with the following command:

# find / -name *.jpg | cpio -o > /dev/st0

You could archive the files from your /etc/samba directory to a floppy drive. The first floppy drive on a Linux computer is normally associated with the /dev/fd0 device file. Thus, you can archive the files in your /etc/samba directory with the following command:

# find /etc/samba | cpio -o > /dev/fd0

The cpio command is substantially more complex; for more information, open the cpio manual from the command-line interface with the man cpio command.

Full, incremental, and differential backups

If you want to use incremental and differential backup schemes, use the dump and restore commands. If the amount of data is larger than the backup media, such as a tape or floppy drive, these commands support the use of multiple tapes or drives. As of this writing, Red Hat does not support the use of these commands with writable CDs or DVDs.

In short, the dump command archives data, and the restore command copies the data back from the archive. This section assumes you've installed a tape drive on your Linux computer, and it's installed on the /dev/nst0 device. You can check this on your own computer in your /etc/fstab configuration file.

dump archives

You can archive the directory of your choice by using the dump command. You can set up full, incremental, or differential backups with this command. Normally, the first step in this sequence is a full backup. For example, if you want to set up a full backup of the /home directory, you would run the following command:

# dump 0f /dev/nst0 /home

The 0f switch indicates the backup level. A 0 in this switch always results in a full backup. This number can vary between 0 and 9. Whether the dump command creates an incremental or a differential backup depends on the number associated with the previous dump command.

Remember, incremental backups save all files created or changed since the last full backup. Here's an example: If you want to create a series of five incremental backups, perhaps for each workday, you could run the numbers in the dump command backwards:

# dump 9f /dev/nst0 /home
# dump 8f /dev/nst0 /home
# dump 7f /dev/nst0 /home
# dump 6f /dev/nst0 /home
# dump 5f /dev/nst0 /home

The number that you start with does not matter. If the numbers go backward from 9, the next dump command creates an incremental backup.

In contrast, differential backups save all files created since the last backup of any kind. If you want to create a series of five differential backups, one for each workday, you would run the numbers forward:

# dump 1f /dev/nst0 /home
# dump 2f /dev/nst0 /home
# dump 3f /dev/nst0 /home
# dump 4f /dev/nst0 /home
# dump 5f /dev/nst0 /home

The numbers don't have to be in sequence; all that matters is that the number associated with the dump command is higher than the one you used with the previous dump command. As an example of how this works, let's see what happens if you created a full backup of the /home/waymon directory on two floppy disks. You'd start with the following command:

# dump 0f /dev/fd0 /home/waymon

You'll see a long series of messages related to the backup. If the files in /home/waymon are too large for a single floppy drive, dump returns the following messages:

DUMP: Change Volumes: Mount volume #2
DUMP: Is the new volume mounted and ready to go?: ("yes" or "no")

When you see the message, insert a new floppy and type yes at the prompt to continue the full backup process. There are a substantial number of options for the dump command, related to everything from compression to date labels. For more information, you can read the dump manual by typing in man dump at the command-line interface.

restore archives

One of the drawbacks of the dump command is that it assumes that you're starting from the top of the directory tree, the root directory (/). Thus, when you use the restore command, you need to make sure that you're in the top-level root directory (/) with the following command:

# cd /

Now you can restore from the device with the backup. I can restore from the backup that I created of the /home/waymon directory in the previous section. First I insert the first floppy disk of the backup, and then I run the following command:

# restore -rf /dev/fd0

This command restores (-r) the filesystem from the noted location (-f), in this case, the first floppy drive (/dev/fd0). If you haven't mounted /home/waymon on its own partition, you'll see a series of warning messages about other directories "not found on tape." If /home is mounted on its own partition, the restore command assumes that you're trying to restore every subdirectory under /home. You can safely ignore these messages.

Once the restore command finishes with this drive, it prompts for additional drives. For this particular backup, it gives me the following messages:

Mount volume 2
Enter "none" if there are no more volumes
otherwise enter volume name (default: /dev/fd0)

In this case, all I need to do is insert the second floppy of the backup into the drive and press Enter. If the rest of the backup is on another device such as /dev/nst0, I would enter that device name when prompted by the "otherwise enter volume name..." message.

Backing up over a network

One of the reasons for having a file server is so that you have a central database of files for easier backups. Users on your network can store all important data on the file server, and then as the administrator, all you have to do is back up user directories on that file server.

If the Linux partition with the /home directory is large enough, the users on your network can simply save their important files to their individual home directories. Your work to back up the /home directory then automatically backs up your users' files. As usual, there are four basic scenarios for backups on a mixed network with Linux and Windows computers:

  • From a Windows client to a Windows file server.
  • From a Windows client to a Linux file server.
  • From a Linux client to a Windows file server.
  • From a Linux client to a Linux file server.

The basic steps that you take to connect from a Linux client are the same, whether you're connecting to a Windows or a Linux file server. For our purposes, some defaults such as home directories do change from server to server, depending on how the Windows and Linux file servers are configured.

I'm assuming that you're using Samba on your Linux computers to connect to other Linux and Windows computers in a Microsoft Windows-style network.

If the file server resides on a Linux or Windows Domain Member Server, you can share and back up from the directories of your choice. If it's a Linux Domain Member Server, you can share its /home directory for read-only backups over the network.

Backing up from a Windows client to a Windows file server

Because this is written for Microsoft Windows administrators, I'm assuming that you know how to configure a backup from a Windows client to a Windows file server. I'll just review the basic technique because you can use the same techniques to back up Linux clients.

Remember, the home directories and profiles are associated with the Microsoft Windows User Profile. In Windows NT 4 Server, the User Environment Profile is associated with the User Manager for Domains.

Naturally, you should configure user home directories and profiles in the same folders. This will ease your burden when you back up these files. Windows NT 4 Server includes its own backup utility, which is accessible from Start | Programs | Administrative Tools (Common) | Backup. (Windows 2000 Server includes a similar backup utility when you click Start | Programs | Accessories | System Tools | Backup.) Because this is a book on Linux, I don't address backups from a Windows client to a Windows server in detail.

Backing up from a Windows client to a Linux file server

Once you've logged in to a Linux PDC, you have access to your home directory on the PDC or any Linux member servers through the Windows Network Neighborhood. If you access it through Windows Explorer, it's easy to back up the files and directories of your choice to your home directory.

Alternatively, if you're willing to create an additional shared directory from your Microsoft Windows client, you can connect to it from the Linux file server by using the smbmount command. Once you've mounted the shared directory from the Microsoft client, you can work with the shared Windows directory as if it were local to the Linux file server. You can then back up the files from that directory using the text commands described earlier.

Backing up from a Linux client

When you log in to a Linux client computer on a Domain, the user name and password are normally local to the client. In that case, when you connect to a Domain PDC or member server, you'll need to enter the Domain user name and password separately.

For example, in the Linux GUI, you can view the computers on your Domain through Nautilus. Click Main Menu | Network Servers. This opens a Nautilus window pointing to smb:/// to view the Domain. Once you click on the Domain and then select the PDC or member server, you'll be prompted for a Domain user name and password in the Authentication Required dialog.

Once you've connected to the file server, you'll have access to your home directory. You can click and drag the files and directories of your choice from the local /home/mj directory to the /home/mj directory on the file server.

Scheduled backups

As an administrator, you'll want to make sure to back up at least the files in the /home directory on a regular basis using scripts for Linux's cron service.

Linux scripts are text files that you can create with GUI or command-line text editors. The slocate.cron script in the /etc/cron.daily directory starts with the command #!/bin/sh, which sets this script to use the bash shell.

Let's say you're setting up a full backup of the /home directory to a tape drive, such as /dev/nst0. In that case, you'll want to add the following command to your script:

/sbin/dump 0f /dev/nst0 /home

Of course, this will work only if you actually have a tape drive connected to the /dev/nst0 device. You'll note that this is a bit different from the command described earlier, which did not include /sbin before the dump command. That's because scripts require the full path to any commands that you use, unless you've set up a PATH command.

The previous command is all you need in the script. Save it with an appropriately descriptive file name such as /etc/cron.weekly/fullbak. Red Hat Linux runs scripts in the /etc/cron.weekly directory every Sunday morning at 4:22.

But the script isn't ready until you make it executable. To do so, you'll need to use the chmod command. The following command makes the script readable and executable by all users:

# chmod 755 /etc/cron.weekly/fullbak

You can create similar scripts for incremental or differential backups, based on your own desired schedule. The /etc/crontab script may not be flexible enough for your needs. This script executes every script in the /etc/cron.weekly directory every Sunday.

For example, say you wanted to create an incremental backup every Wednesday night at 11:30. You'd need to set up a new directory, such as /etc/cron.wednesday. You could then add the following command to /etc/crontab:

30 23 * * 3 root run-parts /etc/cron.wednesday

Then you could create the script you need for incremental backups, and save it in the /etc/cron.wednesday directory.

The aforementioned script is just a sample of what you can do. You're not limited to the dump command. Depending on the hardware you have available, you can configure the other commands in this article in scripts that back up your /home directory to appropriate media such as writable CDs or DVDs.

As of this writing, Linux does not support automation of GUI utilities. Therefore, you need to use text commands to back up the /home directory from a Linux file server.

Click Here!