Get to Know rsync

12526

One of the handiest tools you’ll ever have on Linux (and other *nix systems) is rsync. As the name implies, rsync is used to sync files on remote and local machines. You can use rsync to copy files to remote systems and back again, or to make backups to locally mounted hard drives. A quick glance at the man page might be a bit intimidating, though, so let’s walk through some of the most commonly used options and see if rsync is right for you.

One word of caution before beginning: As an effective tool, rsync is great at moving data from one machine to another. If used incorrectly, though, it can also be an effective tool at overwriting data or deleting files. Be mindful of the fact that once files are deleted on Linux they’re very hard if not impossible to recover, and that’s probably the opposite of what you’re trying to accomplish.

The first time you get ready to run an rsync command, particularly the first few times out, you might want to make use of the -n option (also –dry-run if you prefer the more verbose option name). This is to run through a trial run with no actual changes made.

When to Use rsync

Using a single computer for personal use, you might want to use a tool like iFolder or Dropbox for simple backups. Those tools are well-suited for things like making sure a Documents directory is backed up regularly or available from another location when you’re traveling or in the event of a crash. They’d be a good topic for a separate article, but the use case for iFolder or Dropbox is different than rsync.

Use rsync when you want to do unattended backups, or complex syncing operations. It can do a lot of things that GUI tools can’t, or don’t do well — like syncing files from your home directory to a locally attached USB disk. The GUI tools do some things rsync can’t (such as sharing files with other users and groups like iFolder) or doesn’t do easily — like just dropping a file in a folder and having it synced up to a remote machine.

Using rsync

Most Linux distros should have rsync installed by default. If not, use your distro’s package manager to install rsync. For instance, on openSUSE I’d use:

sudo zypper in rsync

The package should be “rsync” on just about any major distribution.

To send a file from point A to point B, you tell rsync to copy from your source to the destination. This can be a local filesystem, or a remote system. I use rsync to back up to a local USB disk when I travel, and also use rsync to back up files to an offsite service and from my Web server to my local machine for archival.

Here’s a basic example to copy files from a home directory to a USB drive:

rsync -avh --exclude="*.iso" /home/user/bin/ /media/diskid/user_backup/

As you can see, you can combine options after one dash. The -avhz following rsync is the same as using -a -v -h -z, but much easier to type.

Let’s break that down. You have the rsync command followed by four options and one argument. The options are archive (-a), which tells rsync to copy files recursively and to preserve group and user ownership when it copies files. This is generally a good idea to include.

The verbose option (v) tells rsync to print to the terminal what it’s doing in greater detail. I like to use this when I’m testing an rsync command before adding it to a script. When you use this, rsync will print the list of changed files that it’s sending and the time it takes, the amount of data copied over, etc. If you omit -v rsync will just print a short message about the number of bytes sent, size, and so on. But you won’t see a detailed file list. If you’d like to do away with this entirely, you can use the -q option instead to “quiet” rsync — which might be useful in some cases when writing scripts for rsync.

The human-readable option (h) directs rsync to produce slightly more readable output. Without -h, rsync just shows bytes, which might not be terribly useful. Note that -h isn’t the same as –h: rsync uses –h for its help option.

Finally, this command directs rsync to ignore files that end with .iso. I download milestone and release candidate images of openSUSE pretty often, and don’t see any need to maintain an offsite or local backup of them since I can always get them again if needed.

Unattended Backups

Copying files to a locally attached disk can be useful, but I also like to do a backup to a remote machine right before I travel so that I have access to my most important files even if my laptop dies or is stolen. It hasn’t happened yet, but I attribute that to doing backups beforehand. If there’s anything one should be superstitious about, it’s backing up files. If you believe nothing else, believe that files that are not backed up will spontaneously delete themselves if only to spite you.

But make it easy on yourself. Odds are you don’t feel like sitting at your computer entering a password every time rsync opens a new connection to a remote host. You can run unattended backups by setting up an SSH key at the remote host. You can also SSH into a remote host without needing to type the password each time too. If you haven’t set this up previously, it’s very easy. Open a terminal and run:

ssh-keygen -t dsa

This will create a public and private SSH key for you. You’ll be prompted twice for a passphrase. Just hit Enter each time. Now you need to get the public key to the remote host. For this you want to use ssh-copy-id:

ssh-copy-id -i ~/.ssh/id_dsa.pub user@host

This is a script that will copy your public key to a remote host. It may not work on systems that have the remote shell disabled. For instance, I’ve had no success getting it to work with rsync.net. In those cases, you need to copy your public key to ~/.ssh/authorized_keys on the remote system.

Note that this does mean if your local system is compromised, anyone with local access can also shell into remote machines without a password.

Now it’s time to tell rsync to use SSH, and point it at a remote host. To do this, use the -e SSH option, like so:

rsync -avze ssh /home/user/directory/ user@remote.host.net:home/user/directory/

Here I tacked on the remote shell (-e) option to -avz and told rsync that it should use SSH. Note that rsync can also use other methods like rsh, but in practice I’ve never seen rsync used with anything but SSH.

When specifying the directories, be clear on whether you include the trailing slash. In the above example, /home/user/directory would not be the same as /home/user/directory/. The additional slash tells rsync to copy the contents but not the directory itself. Without it, rsync will also create the directory.

What’s the z option do? It tells rsync to compress the data sent.

Note that you can do this the other way too. If you want to back up a remote system to the local system, just swap the remote and local host targets, like so:

rsync -avze ssh user@remote.host.net:home/user/directory/ /home/user/directory/

Other Important Options

If you want to maintain an exact copy of a directory, you can add the –delete option to your rsync command. This will compare the destination and delete any files that aren’t present on the local system.

The advantage of using –delete is that you keep a more or less identical copy of the two filesystems. The disadvantage is that if you delete a local file accidentally and you have an rsync backup going at regular intervals, you will lose the opportunity to recover files from the remote backup.

The flip side of –delete is –backup and –backup-dir or –suffix. If a file already exists on the host, rsync will back it up before it’s overwritten with the file being transferred. I don’t use these much myself, but it might be useful in some cases.

When running an attended copy, I like to use –progress, so I can see what’s going on as rsync copies each file. When doing unattended backups, –progress isn’t terribly useful.

Go Forth and Synchronize

I’ve seen a lot of people put off rsync because the vast array of options and arguments that can be used with rsync seem intimidating to new users. I hope this tutorial helps to get you started. Also be sure to read the rsync man page, and if there’s something that it seems rsync should do, then I strongly recommend consulting Google to see if someone has blogged about doing it or answered a question on a mailing list or forum.

Joe ‘Zonker’ Brockmeier is a longtime FOSS advocate, and currently works for Novell as the community manager for openSUSE. Prior to joining Novell, Brockmeier worked as a technology journalist covering the open source beat for a number of publications, including Linux Magazine, Linux Weekly News, Linux.com, UnixReview.com, IBM developerWorks, and many others.