November 4, 2004

Making secure remote backups with Rsync

Author: Preston St. Pierre

Backups are more important than ever these days, as our digital information collections expand. Many Linux users know rsync as a file transfer utility, but rsync can also be an efficient tool for automating remote backups of your Linux, Windows, and even Mac OS X systems.

In an earlier article, I explained how to use rsync to make local backups of a Linux system. Remote backups, where you store your backed up data on a separate machine, further promote data safety by separating information both physically and geographically.

First steps

To perform secure remote backups, you must have rsync and SSH installed on both your local and your target remote machine. Rsync can use SSH as a secure transport agent.

Make sure rsync is installed by opening a terminal session and typing rsync --version on each machine. You should see a message like rsync version 2.X.X protocol version X. If you receive "command not found" or a similar message, you'll need to download and install rsync. Use your GNU/Linux distribution's package management system to do this, or download and install the source from the rsync Web site. If you're running Microsoft Windows I recommend installing cwRsync. Mac OS X comes with rsync, but if you want to try a different version, check out RsyncX.

SSH is likely to already be installed on your Mac OS X and GNU/Linux systems, while the Windows port of rsync, cwRsync, includes the key SSH programs. I'm going to assume you're running Linux or OS X on the remote machine where the backup is to be stored. Make sure your remote machine has Secure Shell Daemon (sshd) running and that the users of both machines have proper permissions to execute a backup.

To ensure that sshd is running on a remote machine, enter a terminal session and type ssh [user]@[remote.machines.address]. If all is well, you should be asked for the user's password and allowed to log in and check permissions. If the remote machine is the destination where the rsync backup will be stored, you'll want read and write permission to the destination directory.

Once you know all the necessary programs and permissions are in place, choose a directory with a few small files to use as a test backup. On the remote machine, create a destination directory to hold your backups:

rsync -avz -e ssh /some/small/directory/ remote_user@remotehost.com:/backup/destination/directory/

The trailing slash in the source directory causes rsync to copy only the contents of the source directory. Omitting the trailing slash causes rsync to copy both the directory name and its contents to the destination.

After rsync completes, you'll be left with a copy of the source files on the remote computer. Congratulations on your first remote backup! Now let's automate the process.

Automatic backups

The first step in automating remote backups is to remove any required user intervention -- namely requests for SSH passwords. To allow your systems to make an SSH connection without asking for a password, you must generate passphraseless keys. On the local machine, drop into the terminal and enter:


$ ssh-keygen -t dsa -b 2048 -f ~/rsync-key
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase): [press enter here]
Enter same passphrase again: [press enter here]
Your identification has been saved in /home/user/rsync-key.
Your public key has been saved in /home/user/rsync-key.pub.
The key fingerprint is:
8c:57:af:68:cd:b2:7c:aa:6d:d6:ee:0a:5a:a4:29:03 user@localhost

Now copy the public key to the remote machine using Secure Copy:

scp ~/rsync-key.pub user@remotehost:~

Finally, put the public key into the authorized_keys file on the remote host. SSH into the remote machine using ssh user@remotehost.com and execute:


mkdir ~/.ssh
chmod 700 ~/.ssh
mv ~/rsync-key.pub ~/.ssh/
cd ~/.ssh/
touch authorized_keys
chmod 600 authorized_keys
cat rsync-key.pub >> authorized_keys

You should now be able to SSH into the remote machine without being asked for a password. Give it a try by closing your previous SSH session and creating another one by typing ssh -i ~/rsync-key user@remotehost.

These entries with no passwords can originate from any host and execute anything. You can add additional security by limiting what the SSH connection can do via the authorized_keys file. I don't recommend employing any additional security until after your first backup in order to limit the troubleshooting process, but once you've completed that successfully, you can employ additional security by using SSH to connect to the remote machine and editing your ~/.ssh/authorized_keys file. It should look similar to:

ssh-dss AAAAB3NzaC1yc2EAAAABIwAAAIEAyNChQxw/+Da....= user@remotehost.com

To limit where connections are coming from, prefix the key with from="ip.address". To limit what command is executed, prefix the key with command="/path/to/validating/script". As an example, your secured authorized_keys file might look like:

from="192.168.0.1", command="/home/user/validate-rsync.sh" ssh-dss AAAAB3NzaC1yc2EAAAABIwAAAIEAyNChQxw/+Da....= user@remotehost.com

Finally, put something like the following in your validate-rsync.sh file:


#!/bin/sh
case "$SSH_ORIGINAL_COMMAND" in
*\&*)
echo "Rejected"
;;
*\;*)
echo "Rejected"
;;
rsync\ --server*)
$SSH_ORIGINAL_COMMAND
;;
*)
echo "Rejected"
;;
esac

Make it executable by typing: chmod +x ~/validate-rsync.sh. This will check to see if the ssh session is being used to execute an rsync backup. If it is being used for anything else, the session will be rejected and closed.

Rsync should now complete without prompting for a password if we modify the test backup we used earlier, telling it to use keys. Try it out by typing:

rsync -avz -e "ssh -i ~/rsync-key" /some/small/directory/ remote_user@remotehost.com:/backup/destination/directory/

If you're having problems, please ensure you have proper permissions to read from the source (/some/small/directory) and to write to the target (remotehost:/backup/destination). Also make sure passphraseless ssh sessions can be established between the two hosts, ~/rsync-key exists on the machine to be backed up, and that rsync is intalled on both machines.

Next: Making a real backup

Providing all went well, it's time to make a real backup. We'll modify the backup.sh script from my earlier article to allow our backup to be stored remotely. Create an rbackup.sh script by copying the code below into your favorite text editor. Decide on the files and directories you'd like to back up and add them to the $SOURCES variable in the script:

#!/bin/sh
# Author: Brice Burgess - bhb@iceburg.net
# rbackup.sh -- secure backup to a remote machine using rsync.

# Directories to backup. Separate with a space. Exclude trailing slash!
SOURCES="/home/wendy /home/daisy /var/mail"

# IP or FQDN of Remote Machine
RMACHINE=192.168.0.2

# Remote username
RUSER=brice

# Location of passphraseless ssh keyfile
RKEY=/home/user/rsync-key

# Directory to backup to on the remote machine. This is where your backup(s) will be stored
# Exclude trailing slash!
RTARGET="/home/user/backups/my_machine"

# Your EXCLUDE_FILE tells rsync what NOT to backup. Leave it unchanged, missing or
# empty if you want to backup all files in your SOURCES. If performing a
# FULL SYSTEM BACKUP, ie. Your SOURCES is set to "/", you will need to make
# use of EXCLUDE_FILE. The file should contain directories and filenames, one per line.
# An example of a EXCLUDE_FILE would be:
# /proc/
# /tmp/
# /mnt/
# *.SOME_KIND_OF_FILE
EXCLUDE_FILE="/path/to/your/exclude_file.txt"

# Comment out the following line to disable verbose output
VERBOSE="-v"

#######################################
########DO_NOT_EDIT_BELOW_THIS_POINT#########
#######################################

if [ ! -f $RKEY ]; then
  echo "Couldn't find ssh keyfile!"
  echo "Exiting..."
  exit 2
fi

if ! ssh -i $RKEY $RUSER@$RMACHINE "test -x $RTARGET"; then
  echo "Target directory on remote machine doesn't exist or bad permissions."
  echo "Exiting..."
  exit 2
fi

echo "Verifying Sources..."
for source in $SOURCES; do
	echo "Checking $source..."
	if [ ! -x $source ]; then
     echo "Error with $source!"
     echo "Directory either does not exist, or you do not have proper permissions."
     exit 2
   fi
done

if [ -f $EXCLUDE_FILE ]; then
EXCLUDE="--exclude-from=$EXCLUDE_FILE"
fi

echo "Sources verified. Running rsync..."
for source in $SOURCES; do

  # Create directories in $RTARGET to mimick source directory hiearchy
  if ! ssh -i $RKEY $RUSER@$RMACHINE "test -d $RTARGET/$source"; then
    ssh -i $RKEY $RUSER@$RMACHINE "mkdir -p $RTARGET/$source"
  fi

  rsync $VERBOSE $EXCLUDE -a --delete -e "ssh -i $RKEY" $source/ $RUSER@$RMACHINE:$RTARGET/$source/

done

exit 0

Change the $RMACHINE, $RTARGET, $RUSER, and $RKEY variables to appropriate values. Save the script (as rbackup.sh) to your computer and make it executable by typing: chmod +x backup.sh.

If your local machine uses Mac OS X, these methods should work for you within the terminal program. If you're using Windows, the cwRsync installer creates a file named cwrsync.cmd that you can customize as a backup script. Save this file as a batch (.bat) file.

Execute your backup script by typing ./rbackup.sh on the Linux or OS X terminal, or run your batch script from the Windows command prompt. It will take a long time for the script to complete the first time it runs, because rsync must make a copy of each file rather than solely updating changed files. Later runs will complete much faster.

If you notice something is wrong, press Ctrl-C to stop the process. Upon completion of the script, there should be a replica of your $SOURCES on the remote machine.

Automating the process

Assuming rbackup.sh ran successfully, it's time to automate the process. Use the Scheduled Task accessory to run your batch file under Windows. For Linux and OS X, use the cron daemon to schedule backups. The cron daemon uses crontab files to schedule tasks. You can edit the system's main crontab file by becoming the superuser (either by logging in as root or typing su in the terminal) and executing crontab -e to edit the file with your system's default editor.

You'll want to schedule a time for your rbackup.sh to execute. Crontab syntax is:

[minute] [hour] [day] [month] [dow] [command]

Thus, adding the line

0 4 * * * /path/to/rbackup.sh

will execute rbackup.sh at 4:00 a.m. every day, and

0 4 * * 5 /path/to/rbackup.sh

will execute rbackup.sh at 4:00 a.m. every Friday. When you've finished adding the line, save the file and exit.

Keeping multiple backups

You may want to implement multiple remote backups if there is enough space at the destination to hold them. Multiple backups, known as 'snapshots' are important to have in case you've accidently deleted files you'd like to retain. Check the size of your backup by executing:du -sh /rsync/target_directory on the machine holding your backups. You can estimate the number of days to retain by dividing the amount of space you're willing to allocate to backups by your backup size and rounding down to the integer value. If you have the space, keep snapshots of your five past backups.

I've modified the above script to accommodate for multiple backup rotation. The modifications keep a designated number of backups in the remote machine's target directory named after the date they were executed (YYYY-MM-DD_Hour-Minute). Here's the modified script:

#!/bin/sh
# Author: Brice Burgess - bhb@iceburg.net
# multi_rbackup.sh -- secure backup to a remote machine using rsync.
# Uses hard-link rotation to keep multiple backups on the remote machine.

# Directories to backup. Separate with a space. Exclude trailing slash!
SOURCES="/home/wendy /home/daisy /var/mail"

# IP or FQDN of Remote Machine
RMACHINE=192.168.0.2

# Remote username
RUSER=brice

# Location of passphraseless ssh keyfile
RKEY=/home/user/rsync-key

# Directory to backup to on the remote machine. This is where your backup(s) will be stored
# :: NOTICE :: -> Make sure this directory is empty or contains ONLY backups created by
#	                        this script and NOTHING else. Exclude trailing slash!
RTARGET="/home/user/backups/my_machine"

# Set the number of backups to keep (greater than 1). Ensure you have adaquate space.
ROTATIONS=3

# Your EXCLUDE_FILE tells rsync what NOT to backup. Leave it unchanged, missing or
# empty if you want to backup all files in your SOURCES. If performing a
# FULL SYSTEM BACKUP, ie. Your SOURCES is set to "/", you will need to make
# use of EXCLUDE_FILE. The file should contain directories and filenames, one per line.
# An example of a EXCLUDE_FILE would be:
# /proc/
# /tmp/
# /mnt/
# *.SOME_KIND_OF_FILE
EXCLUDE_FILE="/path/to/your/exclude_file.txt"

# Comment out the following line to disable verbose output
VERBOSE="-v"

#######################################
########DO_NOT_EDIT_BELOW_THIS_POINT#########
#######################################

if [ ! -f $RKEY ]; then
  echo "Couldn't find ssh keyfile!"
  echo "Exiting..."
  exit 2
fi

if ! ssh -i $RKEY $RUSER@$RMACHINE "test -x $RTARGET"; then
  echo "Target directory on remote machine doesn't exist or bad permissions."
  echo "Exiting..."
  exit 2
fi

# Set name (date) of backup.
BACKUP_DATE="`date +%F_%H-%M`"

if [ ! $ROTATIONS -gt 1 ]; then
  echo "You must set ROTATIONS to a number greater than 1!"
  echo "Exiting..."
  exit 2
fi

#### BEGIN ROTATION SECTION ####

BACKUP_NUMBER=1
# incrementor used to determine current number of backups

# list all backups in reverse (newest first) order, set name of oldest backup to $backup
# if the retention number has been reached.
for backup in `ssh -i $RKEY $RUSER@$RMACHINE "ls -dXr $RTARGET/*/"`; do
	if [ $BACKUP_NUMBER -eq 1 ]; then
		NEWEST_BACKUP="$backup"
	fi

	if [ $BACKUP_NUMBER -eq $ROTATIONS ]; then
		OLDEST_BACKUP="$backup"
		break
	fi

	let "BACKUP_NUMBER=$BACKUP_NUMBER+1"
done

# Check if $OLDEST_BACKUP has been found. If so, rotate. If not, create new directory for new backup.
if [ $OLDEST_BACKUP ]; then
  # Set oldest backup to current one
  ssh -i $RKEY $RUSER@$RMACHINE "mv $OLDEST_BACKUP $RTARGET/$BACKUP_DATE"
else
  ssh -i $RKEY $RUSER@$RMACHINE "mkdir $RTARGET/$BACKUP_DATE"
fi

# Update current backup using hard links from the most recent backup
if [ $NEWEST_BACKUP ]; then
  ssh -i $RKEY $RUSER@$RMACHINE "cp -al $NEWEST_BACKUP. $RTARGET/$BACKUP_DATE"
fi

#### END ROTATION SECTION ####

# Check to see if rotation section created backup destination directory
if ! ssh -i $RKEY $RUSER@$RMACHINE "test -d $RTARGET/$BACKUP_DATE"; then
  echo "Backup destination not available."
  echo "Make sure you have write permission in RTARGET on Remote Machin  e."
  echo "Exiting..."
  exit 2
fi

echo "Verifying Sources..."
for source in $SOURCES; do
	echo "Checking $source..."
	if [ ! -x $source ]; then
     echo "Error with $source!"
     echo "Directory either does not exist, or you do not have proper permissions."
     exit 2
   fi
done

if [ -f $EXCLUDE_FILE ]; then
EXCLUDE="--exclude-from=$EXCLUDE_FILE"
fi

echo "Sources verified. Running rsync..."
for source in $SOURCES; do

  # Create directories in $RTARGET to mimick source directory hiearchy
  if ! ssh -i $RKEY $RUSER@$RMACHINE "test -d $RTARGET/$BACKUP_DATE/$source"; then
    ssh -i $RKEY $RUSER@$RMACHINE "mkdir -p $RTARGET/$BACKUP_DATE/$source"
  fi

  rsync $VERBOSE $EXCLUDE -a --delete -e "ssh -i $RKEY" $source/ $RUSER@$RMACHINE:$RTARGET/$BACKUP_DATE/$source/

done

exit 0

Rsync is a powerful tool not only for file transfers, but also for advanced and secure backups to remote machines. If you would like to learn more about backing up with rsync, Mike Rubel provides a great tutorial and reference section on his Web site.

Click Here!