Linux.com

Home News Enterprise Computing Linux Storage Back up like an expert with rsync

Back up like an expert with rsync

What's so great about rsync? First, it's designed to speed up file transfer by copying the differences between two files rather than copying an entire file every time. For example, when I'm writing this article, I can make a copy via rsync now and then another copy later. The second (and third, fourth, fifth, etc.) time I copy the file, rsync copies the differences only. That takes far less time, which is especially important when you're doing something like copying a whole directory offsite for daily backup. The first time may take a long time, but the next will only take a few minutes (assuming you don't change that much in the directory on a daily basis).

Another benefit is that rsync can preserve permissions and ownership information, copy symbolic links, and generally is designed to intelligently handle your files.

You shouldn't need to do anything to get rsync installed -- it should be available on almost any Linux distribution by default. If it's not, you should be able to install it from your distribution's package repositories. You will need rsync on both machines if you're copying data to a remote system, of course.

When you're using it to copy files to another host, the rsync utility typically works over a remote shell, such as Secure Shell (SSH) or Remote Shell (RSH). We'll work with SSH in the following examples, because RSH is not secure and you probably don't want to be copying your data using it. It's also possible to connect to a remote host using an rsync daemon, but since SSH is practically ubiquitous these days, there's no need to bother.

Getting to know rsync

The basic syntax for rsync is simple enough -- just run rsync [options] source destination to copy the file or files provided as the source argument to the destination.

So, for example, if you want to copy some files under your home directory to a USB storage device, you might use rsync -a /home/user/dir/ /media/disk/dir/. By the way, "/home/user/dir/" and "/home/usr/dir" are not the same thing to rsync. Without the final slash, rsync will copy the directory in its entirety. With the trailing slash, it will copy the contents of the directory but won't recreate the directory. If you're trying to replicate a directory structure with rsync, you should omit the trailing slash -- for instance, if you're mirroring /var/www on another machine or something like that.

In this example, I included the archive option (-a), which actually combines several rsync options. It combines the recursive and copy symlinks options, preserves group and owner, and generally makes rsync suitable for making archive copies. Note that it doesn't preserve hardlinks; if you want to preserve them, you will need to add the hardlinks option (-H).

Another option you'll probably want to use most of the time is verbose (-v), which tells rsync to report lots of information about what it's doing. You can double and triple up on this option -- so using -v will give you some information, using -vv will give more, and using -vvv will tell you everything that rsync is doing.

rsync will move hidden files (files whose names begin with a .) without any special options. If you want to exclude hidden files, you can use the option --exclude=".*/". You can also use the --exclude option to prevent copying things like Vim's swap files (.swp) and automatic backups (.bak) created by some programs.

Making local copies

Suppose you have an external USB or FireWire drive, and you want to copy data from your home directory to your external drive. A good way to do this would be to keep all your important data under a single top-level directory and then copy it to a backup directory on the external drive using a command like:

rsync -avh /home/usr/dir/ /media/disk/backup/

If you want to make sure that local files you've deleted since the last time you ran rsync are deleted from the external system as well, you'll want to add the --deleted option, like so:

rsync -avh --delete /home/user/dir/ /media/disk/backup

Be very careful with the delete option; with it, you can whack a bunch of files without meaning to. In fact, while you're getting used to rsync, it's probably a good idea to use the --dry-run option with your commands to run through the transfer first, without actually copying or synching files. If you do start an rsync transfer and realize that you've botched the command in some way that might result in the destruction of data, press Ctrl-c immediately to terminate the transfer. Some files may be gone, but you may be able to save the rest.

Making remote copies

What if you want to copy files offsite to a remote host? No problem -- all you need to do is add the host and user information. So, for instance, if you want to copy the same directory to a remote host, you'd use:

rsync -avhe ssh --delete /home/user/dir/ This e-mail address is being protected from spambots. You need JavaScript enabled to view it :dir/

If you want to know how fast the transfer is going, and how much remains to be copied, add the --progress option:

rsync --progress -avhe ssh --delete

/home/user/dir/ This e-mail address is being protected from spambots. You need JavaScript enabled to view it :dir/

If you don't want to be prompted for a password each time rsync makes a connection -- and you don't -- make sure that you have rsync set up to log in using an SSH key rather than a password. To do this, create an SSH key on the local machine using ssh-keygen -t dsa, and press Enter when prompted for a passphrase. After the key is created, use ssh-copy-id -i .ssh/id_dsa.pub This e-mail address is being protected from spambots. You need JavaScript enabled to view it to copy the public key to the remote host.

What if you need to bring back some of the files you copied using rsync? Use the following syntax:

rsync -avze ssh remote.host.com:/home/user/dir/ /local/path/

The z option compresses data during the transfer. If the file you are copying exists on the local host, then rsync will just leave it alone -- the same as if you were copying files to a remote host.

Wrapping it up with a script

Once you've figured out what directory or directories you want to sync up, and you've gotten the commands you need to sync everything, it's easy to wrap it all up with a simple script. Here's a short sample:

rsync --progress -avze ssh --delete 
/home/user/bin/ 
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 :bin/rsync --progress -avze ssh --delete 
/home/user/local/data/ 
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 :local/data/rsync --progress -avze ssh --delete
 /home/user/.tomboy/ 
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 :/.tomboy/

Use the --progress option if you're going to be running rsync interactively. If not, there's no need for it.

If you look at the rsync man page, you can easily be confused. However, after a little practice with rsync, you'll find that it's not hard to set up rsync jobs that will help you prepare for the day that your disk drive craps out and you need access to your data right away.

 

Comments

Subscribe to Comments Feed
  • GG Said:

    Hello, rsync needs root to preserve group and owner. User rsa.pub released are fine, root are not. So how to backup a file server keeping permissions and avoiding root?

  • Francois Scheurer Said:

    I wrote a shell script to do snapshot backups the your full filesystem and with the speed of rsync. It uses hard-links between the backups (deduplication) to have a full backup taking as few disk space as if it would be an incremental one. It comes with tuning settings like MD5 integrity signature, 'chattr' protection, filter rules, disk quota, retention policy with exponential distribution (backups rotation while saving more recent backups than older). it is here: http://blog.pointsoftware.ch/?p=527&preview=true and it's free! ^^ enjoy Francois Scheurer

  • Francois Scheurer Said:

    sorry the direct URL is http://blog.pointsoftware.ch/index.php/howto-local-and-remote-snapshot-backup-using-rsync-with-hard-links/ cheers francois

  • Scotty C Said:

    Zonker, this is by the best tutorial I have seen ... and TRUST me, I've scoured the Internet for days. I'm working with sensitive and critical client data that must be backed up off-site, and your article was just what I needed. Thank you very much for taking the time to write this and keep up the incredible work.

  • Elmar Said:

    Great post - and very useful for people who (want to) use rsync! I also spent some time to learn about rsync as I was not fully satisfied with Apple's Time Machine. To do offsite backups, I started developing an rsync helper script for synchronizations between Macs and network attached storage (NAS) devices. After running and refining my script for some time, it now works very well for my purposes. Maybe my script is of interest to people here. I called it Space Machine and made it available on my blog: goo.gl/UNkn3 With Space Machine, each backup job is stored in a small configuration file which is fed to the script. It is possible to set backup intervals, split up big backup tasks into smaller folder jobs and also to create a backup history by transferring the backup into a dated folder upon each run. Upon completion, the script either sends an email or displays a growl notification on my Mac's desktop. By adding the backup jobs to cron, they run in the background according to a predefined schedule without needing any further attention from the user. As I am not a programmer, I would very much appreciate any feedback for further improvement! To make my code easily accessible, I added comments to both the script itself as well to an example backup job file.

  • GrouchyGaijin Said:

    Can I use rsync to copy and rename? For example copy .mutt to an external drive as simply mutt? (I want the backup not hidden so I remember that it is there) ;-)

  • Francois Said:

    yes GrouchyGaijin, e.g.: rsync -Hax /myfolder/.mutt root@TARGET:/otherfolder/mutt

  • GSCopy Pro v6.0 (RoboCopy Alternative) with Open File Agent Said:

    GSCopy Pro v6.0 (RoboCopy Alternative) with Open File Agent GSCopyPro is a single command-line tool (CLI) that can copy, replicate and move files from one folder to another. This folder can be on the same machine/ server or another server elsewhere. What makes GSCopyPro stand out from other competitors is the fact it works on 32-bit as well as 64-bit systems and has no restrictions. It can easily be scheduled to run as a scheduled task and fully automated. GSCopyPro also comes with an open file agent which can copy files that are locked/ opened by other processes. This feature is supported in all windows vSCersions from widows XP/ 2003 and later. Go To:>> http://www.gurusquad.com/GOPYPRO

  • Mike Said:

    This is from around 1999 but still the very best rsync tutorial I've ever read: http://tinyurl.com/l37guv8

  • Jason Said:

    Hi, Is there any way to exclude dir or file from rsync --delete. Actually i m syncing DIR1 to DIR2 , and want to run --delete but want to preserve .dir in DIR2.

Become an Individual Member
Check out the Friday Funnies

Sign Up For the Linux.com Newsletter


Who we are ?

The Linux Foundation is a non-profit consortium dedicated to the growth of Linux.

More About the foundation...

Frequent Questions

Join / Linux Training / Board