Linux.com

Home Learn Linux Linux Tutorials How to Sync Files to Amazon S3 on Linux

How to Sync Files to Amazon S3 on Linux

Amazon's Simple Storage Service (S3) has a lot to like. It's cheap, can be used for storing a little bit of data or as much as you want, and it can be used for distributing files publicly or just storing your private data. Let's look at how you can take advantage of Amazon S3 on Linux.

Amazon S3 isn't what you'd want to use for storing just a little bit of personal data. For that, you might want to use Dropbox, SpiderOak, ownCloud, or SparkleShare. Which one depends on how much data, your tolerance for non-free software, and which features you prefer. For my work files, I use Dropbox – in large part because of its LAN sync feature.

But S3 is really good if you need to make backups of a large amount of data, or smaller amounts but you need an offsite backup. It's also good if you want to use S3 to host files for public distribution and don't have a server or need to offload data sharing because of capacity issues. Maybe you just want to use it to host a blog, cheaply. S3 also has some nifty features for content distribution and data storage from multiple regions, which we'll get into another time.

Getting the Tools

You can use S3 in a number of ways on Linux, depending on how you'd like to manage your backups. If you look around, you'll find a bunch of tools that support S3, including:

S3 Tools and Duplicity are command line utilities that support S3. S3 Tools, as the name implies, focuses on Amazon S3. Duplicity has S3 support, but also supports several other methods of transferring files. Deja Dup is a fairly simple GNOME app for backups, which has S3 support thanks to Duplicity. Dragon Disk is a freeware (but not free software) utility that provides more fine-grained control of backups to S3. It also supports Google Cloud Storage and other cloud storage software.

For the purposes of this article, I'm going to focus on S3 Tools. If you're a GNOME user, it should take very little effort to set up Deja Dup for S3. We'll tackle Duplicity and Dragon Disk another time.

S3 Tools

You might find S3 Tools in your distribution's repositories. If not, the S3 Tools folks have package repositories and have support for several versions of Red Hat, CentOS, Fedora, openSUSE, SUSE Linux Enterprise, Debian, and Ubuntu. You'll also find instructions on adding the tools on the package repositories page.

Once you have S3 Tools installed, you need to configure it with your Amazon S3 credentials. If you haven't signed up for them yet, hit the Sign Up button at the top of the S3 overview page. You'll also want to look at the pricing, which starts at $0.125 per GB per month.

The pricing calculator can help you get an idea how much it would cost to store your data in S3. For example, if you're storing 100GB in S3, it would run about $12.50 per month - before any costs for data transfer out of S3. Transfer in to S3 is free. Amazon also charges for get/put requests and so forth - so if you're using S3 to serve up content, then the pricing is going to be higher.

Back to the tools. You need to configure s3cmd (the command line utility from the S3 Tools project) like so:

s3cmd --configure

It will walk you through adding your Amazon credentials and GPG information if you want to encrypt files while stored on S3. Amazon's storage is supposed to be private, but you should always assume that data stored on remote servers is potentially visible to others. Since I'm storing information that has no real need for privacy (WordPress backups, MP3s, photos that I'd happily publish online anyway) I don't worry overmuch about encrypting for storage on S3.

There's another advantage of foregoing GPG encryption, which is that s3cmd can use an rsync-like algorithm for syncing files instead of just re-copying everything.

Now to copy files and use s3cmd sync. You'll find that the s3cmd syntax mimics standard *nix commands. Want to see what is being stored in your S3 account? Use s3cmd ls to show all buckets. (Amazon calls 'em buckets instead of directories.)

Want to copy between buckets? Use s3cmd cp bucket1 bucket2. Note that buckets are specified by the syntax s3://bucketname.

To put files in a bucket, use s3cmd put filename s3://bucket. To get files, use s3cmd get filename local. To upload directories, you need to use the --recursive option.

But if you want to sync files and save yourself some trouble down the road, there's the sync command. It's dead simple to use:

s3cmd sync directory s3://bucket/

The first time, it will copy up all files. The next time it will only copy up files that don't already exist on Amazon S3. However, if you want to get rid of files that you have removed locally, use the --delete-removed option. Note that you should test this with the --dry-run option first. You can accidentally delete files that way.

It's pretty simple to use s3cmd, and you should look at its man page as well. It even has some support for the CloudFront CDN service if you need that. Happy syncing!

 

Comments

Subscribe to Comments Feed
  • smith Said:

    how can i move my current web hosting server files to amazon s3 server directly ?

  • Amr Said:

    You can use http://www.autofilemove.com Setup ftp account on your server and AutoFileMove will help you to transfer files from FTP to Amazon s3, also can setup schedules to automate these transfers or schedule for later run.

  • xeeen Said:

    hi,currently I run a online-demo hosting websites for www.xeeen.com ,how can I run an webapp on amazon S3, but use www.xeeen.com as web server ,so S3 looks like a virtual directory of www.xeeen.com(like NFS ,etc)

  • Blaise Said:

    hello, I have a question about data synchronisation into S3. We are using a sync tool to sync our websites data to our S3 account. It seems to work well... but we have 2 problems : - we can save the files rights - we can save the synbolic links.. Is it normal ? Do you know a tool who can manage that ?

  • Amr Said:

    http://www.autofilemove.com is a new tool and can help you with its files filter and rules while transferring files

  • suren Said:

    what are the system requirements are needed to PUSH a files into Amazon S3

Upcoming Linux Foundation Courses

  1. LFD320 Linux Kernel Internals and Debugging
    03 Nov » 07 Nov - Virtual
    Details
  2. LFS416 Linux Security
    03 Nov » 06 Nov - Virtual
    Details
  3. LFS426 Linux Performance Tuning
    10 Nov » 13 Nov - Virtual
    Details

View All Upcoming Courses

Become an Individual Member
Check out the Friday Funnies

Sign Up For the Linux.com Newsletter


Who we are ?

The Linux Foundation is a non-profit consortium dedicated to the growth of Linux.

More About the foundation...

Frequent Questions

Join / Linux Training / Board