Recovery Monkey: Musings on backups, storage, tuning and more

Choose a Topic:

Sat
15
Aug '09

Should your backups to disk consume more disk than you use for production? Seriously?

So, let’s talk about this not-so-hypothetical customer… They have:

  • A few sites
  • A lot of data per site
  • Much of the data is DBs and Multimedia
  • No replication currently
  • Can’t back up everything currently
  • No proper DR
  • Fairly significant rate of change
  • Not the fastest pipes between sites

They asked me to propose a solution that will back everything up and cross-replicate the backups between the sites. They want to move as far away from tape as possible.

After much deliberation and examination of the data and requirements, we concluded that, in order to back everything up (and to stick to their requirements), even with various kinds of dedupe (I sized the solution with best practices for the usual suspects), due to the rate of change and the large amount of data with poor undedupability (that can’t possibly be a word), they will need about 3x the total amount of production space in order to achieve backups to disk (including dedupe!)

So, we declined to propose a solution. I want to sell something as much as the next guy but primarily I want repeat customers and the only way to get a happy repeat customer is to not screw him the first time… And selling them 3x the space only for backups doesn’t make too much sense to me when they could be spending their money much more wisely.

I explained how it doesn’t make sense to spend that kind of money on disk that’s just for backups! After all, backups are a last resort. My list of preferred methods for recovery (from best to worst):

  1. Local and remote replication + application-aware snapshots
  2. Backups to disk
  3. Backups to tape
  4. Snot, a claw hammer, duct tape and bailing wire (sometimes actually works better than tape but anyway…)

Wouldn’t it be a slightly better idea to use maybe 2x the disk, possibly even spend less money compared to the backup-only solution, and instead:

  • Cross-replicate the production data for rapid recovery
  • Achieve full local and remote DR
  • Be able to go back in time with snapshots both locally and remotely
  • Replicate the snapshots themselves automatically
  • Still get dedupe but this time on primary storage (make the current storage last longer)
  • Not need a forklift upgrade (investment protection)
  • Reduce or eliminate tape and reliance on the backup software
  • Get even longer retention than with backups to disk
  • No pipe upgrades
  • Drastically simplify administration
  • Potentially save millions over the next few years!

We’ll see what they decide to do. There was tremendous resistance to what I and a horde of seasoned engineers believe is the proper solution, with all kinds of very reasonable excuses being voiced (”we have no time, no resources, the stakeholders don’t care” etc). However, my position on this is clear. Yes, there’s more short-term pain in order to transform the infrastructure to the utopic vision of the bullets above, but the long-term gains are staggering!

I’ll let everyone know what happened the moment I hear. This one is really interesting…

D

, , , , , , , ,