Backing up is for Dumbos

Sunanda recently lost some photos that she'd clicked on her holiday, and I asked what I thought was the obvious: "where did you back them up"?

A freezing glare followed. "Why", she asked me "should I have to back it up?".

Putting my foot firmly in my mouth, I persisted "its dumb to not back up! Technology 101!!".

"Pretty stupid technology, if you ask me" she said, marching out of the room.

This exchange got me thinking. In all my life as a technology manager (not to mention high priest of family photographs), backup has been are core mantra, but also a colossal the pain in the neck. The process takes too long, is never as current as you want and is invariably missing for the very day that your most important data was being written. Storage did indeed seem more than a bit stupid if so much time and effort was required to protect it from losing data. Technology should be smart enough to do invisibly what is essentially a core maintenance job; why should we need to allocate so much extra time, effort and money towards it? And it is much - about a fifth of every day, a bundle of licenses, a few operators and oodles of IOPS are consumed by the average company just for this - and its been that way since we left punch cards behind.

Lets go back to the basics. Why is backup needed at all? If you have a passive (let's not say dumb) storage system, backup does two things the storage cannot do for itself

Protect against failures such as disk crash
Undo accidental changes to the data by going back in a few days in time.

Protection against failure was important - especially because disk crash used to be pretty common once upon a time (even a misplaced sneeze could do bad things). Today's disks, however, don't fail quite as much and there's plenty of recovery mechanisms when they do. Nutanix has built its core around keeping multiple copies of each block on multiple separate hardware nodes, ensuring a near-perfect availability. The creating and restoring of copies (pretty much what a backup is supposed to do) is continuous, invisible and instantaneous (no chasing IT with a service request). The only thing Nutanix won't cover is if the whole cluster or datacenter goes up in flames, which is not an entirely trivial problem to solve but I suspect no worse than what their Xi-Leap is anyway trying to do. Why then are we breaking our necks backing up something that isn't going to fail?

DISCLAIMER: Nutanix does not run on the pendrives storing Sunanda's photos.

This is, of course, going to be deeply uncomfortable for a lot of people. Backups are such an integral way of life for a generation of IT operators that no matter how rational, it is a hard habit to give up. Not having a recovery you can touch is scary. Then there are regulators to convince, some of whom mandate explicit backup requirements. As a basic principle, though - its a compelling one. Storage should handle all its backing up and recovering invisibly and continuously - and to the user should appear as never failing.

Accidental deletion or corruption is a more complicated matter; consistency, quiesence and all kinds of other tricky issues come charging in. These are, however, hardly unsolved problems. Snapshots have been around for a while and are quite effective at this - its just not invisible or automatic. Think of how life would be if you could just tell your storage to just snapshot itself every hour for upto thirty days - and be able to recover time-machine style back to ten minutes before the ransomware hit you. Why should this not be a basic feature of enterprise storage, given that I don't know of any enterprise that does not need it?

There's also the debate about where such versioning and snapshots should reside - at the OS level (file or object versioning) or at the storage level (block versioning).

Here's my opinion. Versioning at file or object level is good for individuals but enterprises often need to restore whole snapshots; smart storage should offer a time machine at block level. Operating systems or application software can still offer file and object versioning but the big restores should come from the storage. Again, hardly a new idea - storage vendors have been peddling snapshot and restore capacity as an addon for ages; just that it is neither automatic nor invisible.

Every enterprise needs this, and spends enormous time and energy today obtaining a poorer version of this need. Thats why I argue this is a core function of smart storage, not an optional add-on.

There's also another use backups are put to - long-term archive needs. This is where data are stored for years, even decades for forensic reasons. These are meant to aid investigations, disputes and forensic analysis rather than restoring for operational recovery. Should this become part of smart storage? I personally don't think so - archiving is a different need, generally requires a different kind of media and is meant for particular files and data sets rather than for the underlying storage.

Brave For Free

Search This Blog

Backing up is for Dumbos

Labels

Comments

Post a Comment

Popular posts from this blog

Outsourcing I–The "Why" Question

The Economics of 'E'

Rethinking Disaster Recovery