Skip to main content

Backing up is for Dumbos

Sunanda recently lost some photos that she'd clicked on her holiday, and I asked what I thought was the obvious: "where did you back them up"?

A freezing glare followed. "Why", she asked me "should I have to back it up?".

Putting my foot firmly in my mouth, I persisted "its dumb to not back up! Technology 101!!".

"Pretty stupid technology, if you ask me" she said, marching out of the room.

This exchange got me thinking. In all my life as a technology manager (not to mention high priest of family photographs), backup has been are core mantra, but also a colossal the pain in the neck. The process takes too long, is never as current as you want and is invariably missing for the very day that your most important data was being written. Storage did indeed seem more than a bit stupid if so much time and effort was required to protect it from losing data. Technology should be smart enough to do invisibly what is essentially a core maintenance job; why should we need to allocate so much extra time, effort and money towards it? And it is much - about a fifth of every day, a bundle of licenses, a few operators and oodles of IOPS are consumed by the average company just for this - and its been that way since we left punch cards behind.

Lets go back to the basics. Why is backup needed at all? If you have a passive (let's not say dumb) storage system, backup does two things the storage cannot do for itself
  1. Protect against failures such as disk crash
  2. Undo accidental changes to the data by going back in a few days in time. 
Protection against failure was important - especially because disk crash used to be pretty common once upon a time (even a misplaced sneeze could do bad things). Today's disks, however, don't fail quite as much and there's plenty of recovery mechanisms when they do. Nutanix has built its core around keeping multiple copies of each block on multiple separate hardware nodes, ensuring a near-perfect availability. The creating and restoring of copies (pretty much what a backup is supposed to do) is continuous, invisible and instantaneous (no chasing IT with a service request). The only thing Nutanix won't cover is if the whole cluster or datacenter goes up in flames, which is not an entirely trivial problem to solve but I suspect no worse than what their Xi-Leap is anyway trying to do. Why then are we breaking our necks backing up something that isn't going to fail?

DISCLAIMER: Nutanix does not run on the pendrives storing Sunanda's photos.

This is, of course, going to be deeply uncomfortable for a lot of people. Backups are such an integral way of life for a generation of IT operators that no matter how rational, it is a hard habit to give up. Not having a recovery you can touch is scary. Then there are regulators to convince, some of whom mandate explicit backup requirements. As a basic principle, though - its a compelling one. Storage should handle all its backing up and recovering invisibly and continuously - and to the user should appear as never failing.

Accidental deletion or corruption is a more complicated matter; consistency, quiesence and all kinds of other tricky issues come charging in. These are, however, hardly unsolved problems. Snapshots have been around for a while and are quite effective at this - its just not invisible or automatic. Think of how life would be if you could just tell your storage to just snapshot itself every hour for upto thirty days - and be able to recover time-machine style back to ten minutes before the ransomware hit you. Why should this not be a basic feature of enterprise storage, given that I don't know of any enterprise that does not need it?

There's also the debate about where such versioning and snapshots should reside - at the OS level (file or object versioning) or at the storage level (block versioning). 

Here's my opinion. Versioning at file or object level is good for individuals but enterprises often need to restore whole snapshots; smart storage should offer a time machine at block level. Operating systems or application software can still offer file and object versioning but the big restores should come from the storage. Again, hardly a new idea - storage vendors have been peddling snapshot and restore capacity as an addon for ages; just that it is neither automatic nor invisible. 

Every enterprise needs this, and spends enormous time and energy today obtaining a poorer version of this need. Thats why I argue this is a core function of smart storage, not an optional add-on.

There's also another use backups are put to - long-term archive needs. This is where data are stored for years, even decades for forensic reasons. These are meant to aid investigations, disputes and forensic analysis rather than restoring for operational recovery. Should this become part of smart storage? I personally don't think so - archiving is a different need, generally requires a different kind of media and is meant for particular files and data sets rather than for the underlying storage. 


Comments

Popular posts from this blog

Rethinking Disaster Recovery

Disaster Recovery has been on the minds of companies ever since the early days of commercially available computing. Today's world of DR revolves around four acronyms - BIA (business impact analysis), RPO (recovery point objective), RTO (recovery time objective) and BCP (business continuity plan). The acronyms appear in a disaster recovery plan in roughly in that order, the thinking being that you first analyse the impact to business of systems being down, then figure out how far back in the past are you willing to turn the dial back to recover from (last day, last hour, last millisecond). Next focus on how long you can afford to be down. Finally - buy a boatload of hardware, software and services to convert all this into action. Setting up a DR is a hugely expensive affair that takes a significant amount planning and effort, not to mention all those drills and tests every now and then. CTOs have followed this prescription since the late seventies (apparently the first hot site wa

Outsourcing I–The "Why" Question

A little while ago, I was asked to give a presentation to CEOs on outsourcing. The audience wanted to know about adopting outsourcing for their companies; making use of its promise while avoiding its pitfalls. It seemed to me (unimaginatively, I must admit) that the whole thing boiled down to four fundamental questions - the why , the what , the who and the how . I decided to expand the presentation into a series of blog posts, one per question. The Why Question Why outsource? Given that a trillion-dollar industry has crowded a lot of people into Bangalore and made more than one driver rich, it seems a little late to ask this question. However, this isn't really about outsourcing being good or bad per se. Bloggers like us love to wallow in theoretical questions; companies usually want answers to more prosaic stuff. The question really is, why should a company be looking for an outsource partner ?   I've divided the universe into two simple flavours – Tactical and Str

The Economics of 'E'

Mass market retailing is an expensive business. Rents, staff, inventory – the average brick and mortar retailer struggles along with barely visible net margins (spontaneous dancing is known to happen at 5%). With thousands of stores, hundreds of warehouses and over two million employees, Wal-Mart has in the last five years managed a profit margin of just 3.5%. The story is no different for any other major brick & mortar retailer, American or desi. Cool-kid-on-block Internet retail, on the other hand, thumbs a nose at the old-fashioned ways and gives the distinct impression that it can do much better. There's just one small problem. The bellweather Amazon, for all its buzz, seems unfortunately to have done much the same (indeed, a little less at 2.48% over the same period); nor has any other sizeable virtual retailer done much different. What gives? The law of unintended consequences, that's what. Lets take two of the most discussed items – rent and inventory. Mind you, thi