☑ May Day! May Day!

Backups are a hassle, off-site ones doubly so. However, there are a few tools which make life easier — this post discusses some of them.

hard drive

You’re going to lose your files. All of them. Maybe not today, maybe not tomorrow. Maybe not even soon. The question is, will it be for the rest of your life?

When I looked up “back up” in the thesaurus it listed its synonyms as “abandon”, “fall back”, “retreat” and “withdraw”, and I’d say that’s a fair characterisation of many people when they try to back up their data. These people are making a rod for their own back, however, and one day it’ll hit them.

OK, so we need to back stuff up, we get told that all the time, usually by very smug people while we’re scrabbling around trying to recover some important report just before it’s due. So what’s the best way to go about it?

There are several elements to a successful backup solution. I’d say first and foremost among them is automation. If you need to do something manually to kick off a backup then, unless you’re inhumanly organised, you’re going to forget to do it eventually. Once you start forgetting, chances are you’re going to keep forgetting, right up until the point you need that backup. Needless to say, that’s a little late.

The second element is history — the ability to recover previous versions of files even after they’ve been deleted. Hardware failure is only one reason to restore from a backup, it’s also not implausible that you might accidentally delete a file, or perhaps accidentally delete much of its contents and save it over the original. If you don’t notice for a few days, chances are a backup solution without history will have quietly copied that broken version of the file over the top of the previous version in your backup, losing it forever.

The third element is off-site — i.e. your backups should be stored at a physically separate location to the vulnerable systems. I’ve heard of at least a couple of cases where people have carefully ensured they backed up data between multiple computers, only to have them all stolen one night. Or a burned in a fire. Or any of a list of other disasters. These occurrences are rare, of course, but not rare enough to rule them out.

The fourth and final element is that only you have access. You might be backing up some sensitive data, perhaps without realising it, so you want to make sure that your backups are useless to someone stealing them. Typically this is achieved by encrypting them. Actually this should be called something like “encryption” or “security” but then the list wouldn’t form the snappy acronym Ahoy1:

  • Automated
  • History
  • Off-site
  • You (have sole access)

So, how can we hit the sweet spot of all four of these goals? Because I believe that off-site backups are so important, I’m going to completely ignore software which concentrates on backing up to external hard disks or DVDs. I’m also going to ignore the ability to store additional files remotely — this is useful, but a true backup is just a copy of what you already have locally anyway. Finally, I’ll skip over the possibility of simply storing everything in the cloud to begin with, for example with services such as Google Docs or Evernote, since these options are pretty self-explanatory.

The first possibilities are a host of subscription-based services which will transparently copy files from your PC up into some remote storage somewhere. Often these are aimed at Windows users, although many also support Macs. Linux support is generally lacking. Services such as Carbonite offer unlimited storage for a fixed annual fee, although the storage is effectively limited by the size of the hard disk in your PC. Others, such as MozyHome prefer to bill you monthly based on your storage requirements. There are also services such as Jungle Disk which effectively supply software that you can use with third party cloud storage services such as Amazon S3.

These services are aimed squarely at general users and they tend to be friendly to use. They also generally keep old versions of files for 1-3 months, which is probably enough to recover from most accidental deletion and corruption. They can be a little pricey, however, typically costing anything from $5 to $10 a month (around £3-£6). This might not be too much for the peace of mind that someone’s doing the hard work for you but remember that the costs can increase as the amount you need to store goes up. Things can get even more expensive for people with multiple PCs or lots of external storage.

It’s hard to judge the security of these services — mostly these services claim to use well known forms of encryption such as Blowfish or AES and, assuming this is true, they’re pretty secure. Generally you can have more trust in a service where you provide the encryption key and where the encryption is performed client-side, although in this case you must, of course, keep the key safe as there’s no way they can recover your data without it. For those of you paying attention you’ll realise this means an off-site copy of your key as well, stored in a secure location, but it does depend how far you want to take it — there’s always a trade-off between security and convenience.

If you don’t mind doing a bit more of the work yourself, there are other options for backup which may be more economical. Firstly, if you already have PCs at multiple locations then you might be interested in the newly-released BitTorrent Sync. Many people may have already heard of the BitTorrent file-sharing protocol and this software is also from the company co-founded by Bram Cohen, the creator of the protocol. However, it has very little to do with public file-sharing, although it’s based on the same protocol under the hood. It’s more about keeping your own files duplicated across multiple devices.

You can download clients for Windows, OSX or Linux and once you’ve configured them, they sit there watching a set of directories. You do this on several machines which all link together and share changes to the files in the watched directories. As soon as you add, delete or edit a file on one machine, the sync clients will share that change across the others. Essentially it’s a bit like a private version of Dropbox.

This is a bit of a cheat in the context of this article, of course, because it doesn’t meet one of my own criteria, storing the history of files — it’s a straight sync tool. I’m still mentioning it for two reasons — firstly, it might form a useful component of another backup solution where some other component provides file history; secondly, they’re my criteria and I’ll ignore them if I want to.

Like BitTorrent, it becomes more efficient as you add more machines to the swarm and it has the ability to share links to other peers so in general you should only need to hook a new machine to one of the others in the cloud and it should learn about the rest. It’s also pretty secure as each directory is associated with a unique key and all traffic is encrypted with it — if a peer doesn’t have the key, it can’t share the files. The data at each site isn’t stored encrypted, however, so you still need to maintain physical security of each system as you’d expect. There’s also the possibility to add read-only and one-time keys for sharing files with other people, but I haven’t tried this personally.

I haven’t played with it extensively yet, but from my early experiments it seems pretty good. It’s synchronisation is fast, its memory usage is low and it seems to make good use of OS-specific features to react to file changes quickly and efficiently.

The main downside at the moment is that it’s still at quite an early stage and should be considered beta quality at best. That said, I haven’t had any problems myself. It’s also closed source which might be a turn-off for some people and it’s not yet clear whether the software will remain available for free indefinitely. It also doesn’t duplicate OS-specific meta-information such as Unix permissions which may be an issue for Linux and potentially OSX users.

On the subject of preserving Unix permissions and the like, what options exist for that? Well, there is a very handy tool called rdiff-backup which is based on rather wonderful rsync. Like rsync it’s intended to duplicate one directory somewhere else, either on the same machine or remotely via an SSH connection. Unlike rsync, however, it not only makes the destination directory a clone of the source, but it also stores reverse-diffs of the files back from that point so you can roll them back to any previous backup point.

I’ve had a lot of success using it, although you need to be fairly technical to set it up as there’s a plethora of command-line options to control what’s included and excluded from the backup, how long to keep historical versions and all sorts of other information. The flip side to this slight complexity is that it’s pretty flexible. It’s also quite efficient on space, since it only stores the differences between files that have changed as opposed to many tools which store an entire new copy of the file.

The one area where rdiff-backup falls down, however, is security — it’s fine for backing up between trusted systems, but what about putting information on cloud storage which you don’t necessarily trust? Fortunately there’s another tool based on rdiff-backup called Duplicity which I’ve only relatively recently discovered.

This is a fantastic little tool which allows you to create incremental backups. Essentially this means that the first time you do a backup, it creates a complete copy of all your files. The next time it stores the differences between the previous backup and the current state of the files, like rdiff-backup but using forward-diffs rather than reverse. This means to restore a backup you need the last full one plus all the incrementals in between.

The clever bit is that it splits your files up into chunks2 and also encrypts each chunk with a passphrase that you supply. This means you can safely deposit those chunks on any third party storage you choose without fear of them sneaking a peek at your files. Indeed, Duplicity already comes with a set of different backends for dumping files on a variety of third party storage solutions including Google Drive and Amazon S3, as well as remote SFTP and WebDAV shares.

It’s free and open source, although just like rdiff-backup it’s probably for the more technically-minded user. It also doesn’t run under Windows3. However, Windows users need not despair — it has inspired another project called Duplicati which is a reimplementation from scratch in C#. I haven’t used this at all myself, but it looks very similar to Duplicity in terms of its basic functionality, although there are some small differences which make it incompatible.

The main difference appears to be that it layers a more friendly GUI for configuring the whole thing, which probably makes it more accessible to average users. It still supports full and incremental backups, compression and encryption just as Duplicity does. It also will run on OSX and Linux with the aid of Mono, although unlike Duplicity it doesn’t currently support meta-information such as Unix permissions4, which probably makes Duplicity a more attractive option for Linux unless you really need to restore on different platforms.

Anyway, that’s probably enough of a summary for now. Whatever you do, however, if you’re not doing backups then start, unless you’re the sort of person who craves disappointment and despair. If not then you’ll definitely regret it at some point. Maybe not today- Oh wait, we’ve done that already.

  1. Everyone knows you need a catchy mnemonic when you’re trying to repackage common sense and sell it to people. 

  2. Bzipped multivolume tar archives, for the technically minded. 

  3. At least not without a lot of faff involving Cygwin and a handful of other packages. 

  4. Although there is an open issue in their tracker about implementing support for meta-information. 

1 May 2013 at 1:09PM by Andy Pearce in Software  | Photo by Patrick Lindenberg on Unsplash  | Tags: backup cloud