Linux.com

Home Learn Linux Linux Tutorials Multiply Your Encrypted Linux Backups with Horcrux

Multiply Your Encrypted Linux Backups with Horcrux

Horcrux is an excellent wrapper around the rsync-based Duplicity, for easily managing automated, encrypted backups to multiple locations.


Melted serverHorcrux uses what its author, Chris Poole, calls the Voldemort approach, which is multiple backups to multiple locations. If you're not a Harry Potter fan, a dark wizard or witch can hide a fragment of their soul in a physical object. This is called a Horcrux. Then if the physical body is destroyed, the witch or wizard can be resurrected. Creating multiple Horcruxes is a way to achieve immortality. There is a price to pay, however. Each Horcrux requires an act of murder, and each one diminishes the humanity of its creator.

Fortunately, using Chris Poole's Horcrux doesn't require any awful deeds, but merely editing some configuration files. If you're already a Duplicity user, Horcrux adds the ability to easily send backups to different locations, to encrypt them, and to customize each one if you wish. Horcrux also includes a simple way to test your backups.

Installation

Horcrux is a Bash script. Copy it from the download page into a new text file, give it a name, make it executable and put it in a directory that's in your path. Make it owned by a user with sufficient permissions to read the files you want backed up. Then run it with no options to generate its global configuration file, ~/.horcrux/horcrux.conf. On my server it is owned by root:root, and.horcrux/horcrux.conf is in /root.

Let's take a walk through horcrux.conf.

source="//" specifies the root directory as the source directory, so you can back up any files in your filesystem. If you prefer, you can narrow it down to a specific source directory:

source="/just/this/directory/"

Always use the full path and don't forget the trailing slash.

encrypt_key=123456 is the key ID of your GPG encryption key. This is optional, but highly recommended for offsite backups. There are a pasquillion how-tos on creating and managing GPG keys, so I shall not repeat them here. I will give you one great tip: the easy way to generate enough entropy when you're creating GPG keys is to run this command in a separate terminal:

$ ls -R /

That recursively lists all the files on your filesystem, which will generate more than enough entropy to keep GPG happy.

use_agent=true, because Horcrux will need your GPG passphrase, and gpg-agent is the best way to manage GPG passwords. (Again, please refer to any of the many good GPG how-tos to learn how to use gpg-agent.)

remove_n=3 means that you will not have more than three full backups, because if you create a fourth full backup the oldest one will be removed. Use this in conjunction with the full_if_old= to control how many full backups will be run and saved.

vol_size=250 splits the backup into 250MB volumes. The default is 25MB, which means that large backups create a huge number of files. You might run into filesystem or quota limits with smaller volume sizes. Possible pitfalls with larger volume sizes are unreliable file transfers, and filesystems that don't handle bigger files well, which I don't believe is much of an issue these days.

full_if_old=60 determines how often Horcrux will run a full backup. The default is 360 days. On my server a full backup to the remote backup server takes three days, so I have it run a full backup only every 60 days.

Where to Send the Backup

Every backup set needs two files: a config file and an exclude file. The config file contains the remote server destination in a form similar to the standard rsync syntax, like this:

destination_path="rsync://
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 /dir/"

You can also send backups to local attached media, like this:

destination_path="file:///media/medianame/backup/"

Note that there are three slashes-- file:// is the URL, and / is the beginning of the filepath. The various formats are spelled out in the Duplicity man page.

You can use ssh-agent to manage ssh logins, or use password-less public key authentication.

Which Files to Backup

Your file selection is written in an -exclude file. Give this file a name that helps you remember what this backup does; for example, dropbox1-exclude. File selection uses the Duplicity syntax. This is a simple example with basic includes and excludes:

+ /home/data  
/home/data/tmp
+ /etc
/etc/stuff
**/*

+ means include, and no + means exclude. Items are processed in order, so first include /home/data, then exclude the subdirectory /home/data/temp. Include /etc, and then exclude the subdirectory /etc/stuff. You can use wildcards to select files by file extensions, for example select .png files:


+ /home/data/images/*.png

**/* at the end means "Ignore everything else." A single asterisk is our familiar wildcard that expands to everything except /, and ** is a special globbing pattern that means "everything," including /. So you could exclude all temporary files like this:


**/*tmp

Or include all files with your name in them:


+ **/*carla*

You can specify ranges in square brackets, for example to specify "carla" in either upper- or lowercase is [Cc]arla. A range of numbers is like [5-8]. If you want to dive into subdirectories, then/home/data/images/*/**.jpg matches all subdirectories after images, and selects all .jpg files in those directories.

First Backup

Now that you have your three required configuration files, you can run your first backup. Chris Poole has done a nice job of streamlining and shortening Duplicity's commands, so you can start your first backup like this:

$ horcrux auto dropbox1

The auto option runs a full backup if it does not find an incremental backup set. Good old rsync is the engine that powers Horcrux and Duplicity, so the first run always takes the longest because it has to copy everything. Then for subsequent backups only changes are uploaded.

You can create multiple backups that go to different locations, including local media. All you need are config file and exclude file pairs for each backup. Your backup filenames must be in this format: backupname-configbackupname-excludebackupname can use letters, numbers, and punctuation marks, except for hyphens.

Restoring From Backup

You can restore your entire backup set or specific files and directories. This example restores a single file, and it specifies the backup name and restore directory:

$ horcrux -f myfile restore dropbox1 /restore/directory/myfile

You can go back in time and select an older backup by specifying the date in YYYY/MM/DD format:

$ horcrux -t 2012-08-22 -f myfile restore dropbox1 /restore/directory/myfile

There are several other time specifications, such as n days or weeks ago, which you find by running horcrux help, or consulting the online documentation.

The simplest way to automate your Horcrux backups is with a cron job, like this:

 00 * * * * horcrux auto dropbox1

That runs it every night at midnight. For help and more information visit Horcrux: A Wrapper for Duplicity and duplicity.

 

Comments

Subscribe to Comments Feed
  • GoinEasy9 Said:

    This is an easy and interesting script. Hopefully. I'll find some time to experiment with it. For those who aren't familiar with bash scripts, it's easy enough to follow. It's worth a look.

  • John Said:

    This is nice, but one of the problems with Duplicity is that if you have more than 50 or 100Gb of data, remote full backups take *forever* to complete. And if you are trying to save family photos and you dump 10+g into your pictures folder, how do you keep up? Eventually you have a full backups which is 250g or more and then duplicity breaks down completely, because it doesn't support synthetic fulls on the destination. Not only that, but trying to setup NAS4Free (sorry, I know it's a tangent, but it's part of the story...) to support duplicity is a pain if you try to do a Compact Flash only install, since you can't import the module easily. It's either a full install or nothing, which gets in the way. Sigh... I'll just have to work around it somehow I guess. Anyway, doing remote backups is good!! But having enough bandwidth to do it in a reliable manner, without also blowing through your ISP caps (for upload if nothing else) is tough to manage.

  • Chris Said:

    Yes, I am currently 13+ hours into an encrypted Duplicity remote backup and have only 4.4gigs on the remote server so far, ...and my connection is currently crawling -- maybe it's because I used the asynchronous-upload option? I think I might try and make my own script sometime :(

Upcoming Linux Foundation Courses

  1. LFD320 Linux Kernel Internals and Debugging
    03 Nov » 07 Nov - Virtual
    Details
  2. LFS416 Linux Security
    03 Nov » 06 Nov - Virtual
    Details
  3. LFS426 Linux Performance Tuning
    10 Nov » 13 Nov - Virtual
    Details

View All Upcoming Courses

Become an Individual Member
Check out the Friday Funnies

Sign Up For the Linux.com Newsletter


Who we are ?

The Linux Foundation is a non-profit consortium dedicated to the growth of Linux.

More About the foundation...

Frequent Questions

Join / Linux Training / Board