To begin each year, Joe Tucci brings 400+ people together for the EMC Leadership Meeting. We spend a little time reflecting on the prior year, but most of it focusing on the future. After that, the Backup and Recovery Systems Division leadership spends another day planning our future. So, imagine my surprise when I saw, on the Backup and Recovery Professionals Group on LinkedIn, a thoughtful discussion about the role of tape in the backup environment. I’ve just spent a week discussing cloud, big data, and the evolution of data protection… and we’re still talking about tape? Inconceivable!
While I appreciate both the maturity of the discussion and the resiliency of tape, it’s a waste of time. Every moment spent talking about tape is a moment not spent discussing the future of data protection – deduplication, snapshots, versioned replication, cloud, ???. The opportunity cost of discussing tape frustrates me.
Tape is not the answer of the future. It’s increasingly less useful in the present – unless you’re talking about data that you don’t ever intend to actually access again. Here’s the reasoning:
- Full recovery from a complete server or storage array outage: As capacity increases, the only way to recover the data quickly enough to be useful is to have it online and spinning somewhere (e.g., replication). The issue here isn’t so much disk vs. tape as it is tape-centric backup architectures. If you need to wait until all of the data is restored to a new system (and writing data on the recovering system is usually the bottleneck), you’ve been down too long. Tape doesn’t hit the bar here.
- Rollback from corruption: If most of the data is still good, but there’s been some corruption (user or system caused), the only way to recover quickly is some sort of changed block rollback (e.g., snapshot/clone rollback, changed block recovery for VMs, etc.). In general tape-centric backup architectures make rollbacks near-impossible.
- Granular recovery: When it comes to granular recovery, it’s all about getting the right version of your data. In this case, recovery is all about backup – when you can do backup more frequently and store more copies (space-efficiently, of course), you’re more likely to get the version of the data you want. In general, disk-centric architectures that leverage some sort of data optimization (e.g., dedupe, snapshots, clones) enable you to keep more and more frequent backups.
- Archival recovery: Traditionally, this has been where tape has made its arguments around relevance – long-term, cost-effective, low-power retention. But here’s the problem. In general, we’ve all agreed that backup is non-optimal for data archival. It’s rare that you can track the lifecycle of data (e.g., ‘I want to recover a file from server X, from 12 years ago. Does anybody remember what server the file was on 12 years ago?’), you’re unlikely to have the infrastructure to access it (e.g., ‘Does anybody have a DEC server with application X, version Y?’), and even less likely to manage the tape infrastructure lifecycle to enable the data recovery. As I’ve seen customers go tapeless at multiple companies (as I’ve worked at multiple vendors), they use the transition to disk to re-examine and reduce their retention periods, and deploy a true archival solution.
I think one customer put it best: “I’m legally required to store data for 30 years, but I’m not required by law or business to ever recover it. That data is perfect for tape.”
Do you think we need to spend more time talking about tape? Do you think tape has a bigger role to play today or in the future? If you had new money to spend, would you put it on tape? Am I being overly dismissive? Please weigh in here or on LinkedIn – Backup & Recovery Professionals Group.